|
V.C.Kieu, Alicia Fornes, M. Visani, N.Journet and Anjan Dutta. 2013. The ICDAR/GREC 2013 Music Scores Competition on Staff Removal. 10th IAPR International Workshop on Graphics Recognition.
Abstract: The first competition on music scores that was organized at ICDAR and GREC in 2011 awoke the interest of researchers, who participated both at staff removal and writer identification tasks. In this second edition, we propose a staff removal competition where we simulate old music scores. Thus, we have created a new set of images, which contain noise and 3D distortions. This paper describes the distortion methods, metrics, the participant’s methods and the obtained results.
Keywords: Competition; Music scores; Staff Removal
|
|
|
M. Visani, V.C.Kieu, Alicia Fornes and N.Journet. 2013. The ICDAR 2013 Music Scores Competition: Staff Removal. 12th International Conference on Document Analysis and Recognition.1439–1443.
Abstract: The first competition on music scores that was organized at ICDAR in 2011 awoke the interest of researchers, who participated both at staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario: old music scores. For this purpose, we have generated a new set of images using two kinds of degradations: local noise and 3D distortions. This paper describes the dataset, distortion methods, evaluation metrics, the participant's methods and the obtained results.
|
|
|
Marçal Rusiñol, V. Poulain d'Andecy, Dimosthenis Karatzas and Josep Llados. 2013. Classification of Administrative Document Images by Logo Identification. 10th IAPR International Workshop on Graphics Recognition.
Abstract: This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier's graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents.
|
|
|
Marçal Rusiñol, Dimosthenis Karatzas and Josep Llados. 2013. Spotting Graphical Symbols in Camera-Acquired Documents in Real Time. 10th IAPR International Workshop on Graphics Recognition.
Abstract: In this paper we present a system devoted to spot graphical symbols in camera-acquired document images. The system is based on the extraction and further matching of ORB compact local features computed over interest key-points. Then, the FLANN indexing framework based on approximate nearest neighbor search allows to efficiently match local descriptors between the captured scene and the graphical models. Finally, the RANSAC algorithm is used in order to compute the homography between the spotted symbol and its appearance in the document image. The proposed approach is efficient and is able to work in real time.
|
|
|
Marçal Rusiñol, T.Benkhelfallah and V. Poulain d'Andecy. 2013. Field Extraction from Administrative Documents by Incremental Structural Templates. 12th International Conference on Document Analysis and Recognition.1100–1104.
Abstract: In this paper we present an incremental framework aimed at extracting field information from administrative document images in the context of a Digital Mail-room scenario. Given a single training sample in which the user has marked which fields have to be extracted from a particular document class, a document model representing structural relationships among words is built. This model is incrementally refined as the system processes more and more documents from the same class. A reformulation of the tf-idf statistic scheme allows to adjust the importance weights of the structural relationships among words. We report in the experimental section our results obtained with a large dataset of real invoices.
|
|
|
Albert Gordo, Marçal Rusiñol, Dimosthenis Karatzas and Andrew Bagdanov. 2013. Document Classification and Page Stream Segmentation for Digital Mailroom Applications. 12th International Conference on Document Analysis and Recognition.621–625.
Abstract: In this paper we present a method for the segmentation of continuous page streams into multipage documents and the simultaneous classification of the resulting documents. We first present an approach to combine the multiple pages of a document into a single feature vector that represents the whole document. Despite its simplicity and low computational cost, the proposed representation yields results comparable to more complex methods in multipage document classification tasks. We then exploit this representation in the context of page stream segmentation. The most plausible segmentation of a page stream into a sequence of multipage documents is obtained by optimizing a statistical model that represents the probability of each segmented multipage document belonging to a particular class. Experimental results are reported on a large sample of real administrative multipage documents.
|
|
|
L. Rothacker, Marçal Rusiñol and G.A. Fink. 2013. Bag-of-Features HMMs for segmentation-free word spotting in handwritten documents. 12th International Conference on Document Analysis and Recognition.1305–1309.
Abstract: Recent HMM-based approaches to handwritten word spotting require large amounts of learning samples and mostly rely on a prior segmentation of the document. We propose to use Bag-of-Features HMMs in a patch-based segmentation-free framework that are estimated by a single sample. Bag-of-Features HMMs use statistics of local image feature representatives. Therefore they can be considered as a variant of discrete HMMs allowing to model the observation of a number of features at a point in time. The discrete nature enables us to estimate a query model with only a single example of the query provided by the user. This makes our method very flexible with respect to the availability of training data. Furthermore, we are able to outperform state-of-the-art results on the George Washington dataset.
|
|
|
Thanh Ha Do, Salvatore Tabbone and Oriol Ramos Terrades. 2013. New Approach for Symbol Recognition Combining Shape Context of Interest Points with Sparse Representation. 12th International Conference on Document Analysis and Recognition.265–269.
Abstract: In this paper, we propose a new approach for symbol description. Our method is built based on the combination of shape context of interest points descriptor and sparse representation. More specifically, we first learn a dictionary describing shape context of interest point descriptors. Then, based on information retrieval techniques, we build a vector model for each symbol based on its sparse representation in a visual vocabulary whose visual words are columns in the learneddictionary. The retrieval task is performed by ranking symbols based on similarity between vector models. Evaluation of our method, using benchmark datasets, demonstrates the validity of our approach and shows that it outperforms related state-of-theart methods.
|
|
|
R. Bertrand, P. Gomez-Krämer, Oriol Ramos Terrades, P. Franco and Jean-Marc Ogier. 2013. A System Based On Intrinsic Features for Fraudulent Document Detection. 12th International Conference on Document Analysis and Recognition.106–110.
Abstract: Paper documents still represent a large amount of information supports used nowadays and may contain critical data. Even though official documents are secured with techniques such as printed patterns or artwork, paper documents suffer froma lack of security.
However, the high availability of cheap scanning and printing hardware allows non-experts to easily create fake documents. As the use of a watermarking system added during the document production step is hardly possible, solutions have to be proposed to distinguish a genuine document from a forged one.
In this paper, we present an automatic forgery detection method based on document’s intrinsic features at character level. This method is based on the one hand on outlier character detection in a discriminant feature space and on the other hand on the detection of strictly similar characters. Therefore, a feature set iscomputed for all characters. Then, based on a distance between characters of the same class.
Keywords: paper document; document analysis; fraudulent document; forgery; fake
|
|
|
Jon Almazan, Albert Gordo, Alicia Fornes and Ernest Valveny. 2013. Handwritten Word Spotting with Corrected Attributes. 15th IEEE International Conference on Computer Vision.1017–1024.
Abstract: We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results.
|
|