|
Marçal Rusiñol, Volkmar Frinken, Dimosthenis Karatzas, Andrew Bagdanov and Josep Llados. 2014. Multimodal page classification in administrative document image streams. IJDAR, 17(4), 331–341.
Abstract: In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an n-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.
Keywords: Digital mail room; Multimodal page classification; Visual and textual document description
|
|
|
Francisco Cruz and Oriol Ramos Terrades. 2014. EM-Based Layout Analysis Method for Structured Documents. 22nd International Conference on Pattern Recognition.315–320.
Abstract: In this paper we present a method to perform layout analysis in structured documents. We proposed an EM-based algorithm to fit a set of Gaussian mixtures to the different regions according to the logical distribution along the page. After the convergence, we estimate the final shape of the regions according
to the parameters computed for each component of the mixture. We evaluated our method in the task of record detection in a collection of historical structured documents and performed a comparison with other previous works in this task.
|
|
|
Francisco Alvaro, Francisco Cruz, Joan Andreu Sanchez, Oriol Ramos Terrades and Jose Miguel Benedi. 2015. Structure Detection and Segmentation of Documents Using 2D Stochastic Context-Free Grammars. NEUCOM, 150(A), 147–154.
Abstract: In this paper we dene a bidimensional extension of Stochastic Context-Free Grammars for structure detection and segmentation of images of documents.
Two sets of text classication features are used to perform an initial classication of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of historical marriage license books to validate this approach. We also tested several inference algorithms for Probabilistic Graphical Models
and the results showed that the proposed grammatical model outperformed
the other methods. Furthermore, grammars also provide the document structure
along with its segmentation.
Keywords: document image analysis; stochastic context-free grammars; text classication features
|
|
|
Lluis Gomez and Dimosthenis Karatzas. 2014. Scene Text Recognition: No Country for Old Men? 1st International Workshop on Robust Reading.
|
|
|
Thanh Ha Do, Salvatore Tabbone and Oriol Ramos Terrades. 2014. Spotting Symbol Using Sparsity over Learned Dictionary of Local Descriptors. 11th IAPR International Workshop on Document Analysis and Systems.156–160.
Abstract: This paper proposes a new approach to spot symbols into graphical documents using sparse representations. More specifically, a dictionary is learned from a training database of local descriptors defined over the documents. Following their sparse representations, interest points sharing similar properties are used to define interest regions. Using an original adaptation of information retrieval techniques, a vector model for interest regions and for a query symbol is built based on its sparsity in a visual vocabulary where the visual words are columns in the learned dictionary. The matching process is performed comparing the similarity between vector models. Evaluation on SESYD datasets demonstrates that our method is promising.
|
|
|
Marçal Rusiñol, David Aldavert, Ricardo Toledo and Josep Llados. 2015. Efficient segmentation-free keyword spotting in historical document collections. PR, 48(2), 545–555.
Abstract: In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches.
Keywords: Historical documents; Keyword spotting; Segmentation-free; Dense SIFT features; Latent semantic analysis; Product quantization
|
|
|
Marçal Rusiñol, J. Chazalon and Jean-Marc Ogier. 2014. Combining Focus Measure Operators to Predict OCR Accuracy in Mobile-Captured Document Images. 11th IAPR International Workshop on Document Analysis and Systems.181–185.
Abstract: Mobile document image acquisition is a new trend raising serious issues in business document processing workflows. Such digitization procedure is unreliable, and integrates many distortions which must be detected as soon as possible, on the mobile, to avoid paying data transmission fees, and losing information due to the inability to re-capture later a document with temporary availability. In this context, out-of-focus blur is major issue: users have no direct control over it, and it seriously degrades OCR recognition. In this paper, we concentrate on the estimation of focus quality, to ensure a sufficient legibility of a document image for OCR processing. We propose two contributions to improve OCR accuracy prediction for mobile-captured document images. First, we present 24 focus measures, never tested on document images, which are fast to compute and require no training. Second, we show that a combination of those measures enables state-of-the art performance regarding the correlation with OCR accuracy. The resulting approach is fast, robust, and easy to implement in a mobile device. Experiments are performed on a public dataset, and precise details about image processing are given.
|
|
|
Marçal Rusiñol, J. Chazalon and Jean-Marc Ogier. 2014. Normalisation et validation d'images de documents capturées en mobilité. Colloque International Francophone sur l'Écrit et le Document.109–124.
Abstract: Mobile document image acquisition integrates many distortions which must be corrected or detected on the device, before the document becomes unavailable or paying data transmission fees. In this paper, we propose a system to correct perspective and illumination issues, and estimate the sharpness of the image for OCR recognition. The correction step relies on fast and accurate border detection followed by illumination normalization. Its evaluation on a private dataset shows a clear improvement on OCR accuracy. The quality assessment
step relies on a combination of focus measures. Its evaluation on a public dataset shows that this simple method compares well to state of the art, learning-based methods which cannot be embedded on a mobile, and outperforms metric-based methods.
Keywords: mobile document image acquisition; perspective correction; illumination correction; quality assessment; focus measure; OCR accuracy prediction
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, Sergi Robles and Gemma Sanchez. 2015. CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR, 18(1), 15–30.
Abstract: Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.
|
|
|
P. Wang, V. Eglin, C. Garcia, C. Largeron, Josep Llados and Alicia Fornes. 2014. Représentation par graphe de mots manuscrits dans les images pour la recherche par similarité. Colloque International Francophone sur l'Écrit et le Document.233–248.
Abstract: Effective information retrieval on handwritten document images has always been
a challenging task. In this paper, we propose a novel handwritten word spotting approach based on graph representation. The presented model comprises both topological and morphological signatures of handwriting. Skeleton-based graphs with the Shape Context labeled vertexes are established for connected components. Each word image is represented as a sequence of graphs. In order to be robust to the handwriting variations, an exhaustive merging process based on DTW alignment results introduced in the similarity measure between word images. With respect to the computation complexity, an approximate graph edit distance approach using bipartite matching is employed for graph matching. The experiments on the George Washington dataset and the marriage records from the Barcelona Cathedral dataset demonstrate that the proposed approach outperforms the state-of-the-art structural methods.
Keywords: word spotting; graph-based representation; shape context description; graph edit distance; DTW; block merging; query by example
|
|