|
Jon Almazan, Alicia Fornes and Ernest Valveny. 2012. A non-rigid appearance model for shape description and recognition. PR, 45(9), 3105–3113.
Abstract: In this paper we describe a framework to learn a model of shape variability in a set of patterns. The framework is based on the Active Appearance Model (AAM) and permits to combine shape deformations with appearance variability. We have used two modifications of the Blurred Shape Model (BSM) descriptor as basic shape and appearance features to learn the model. These modifications permit to overcome the rigidity of the original BSM, adapting it to the deformations of the shape to be represented. We have applied this framework to representation and classification of handwritten digits and symbols. We show that results of the proposed methodology outperform the original BSM approach.
Keywords: Shape recognition; Deformable models; Shape modeling; Hand-drawn recognition
|
|
|
Francisco Alvaro, Francisco Cruz, Joan Andreu Sanchez, Oriol Ramos Terrades and Jose Miguel Bemedi. 2013. Page Segmentation of Structured Documents Using 2D Stochastic Context-Free Grammars. 6th Iberian Conference on Pattern Recognition and Image Analysis. Springer Berlin Heidelberg, 133–140. (LNCS.)
Abstract: In this paper we define a bidimensional extension of Stochastic Context-Free Grammars for page segmentation of structured documents. Two sets of text classification features are used to perform an initial classification of each zone of the page. Then, the page segmentation is obtained as the most likely hypothesis according to a grammar. This approach is compared to Conditional Random Fields and results show significant improvements in several cases. Furthermore, grammars provide a detailed segmentation that allowed a semantic evaluation which also validates this model.
|
|
|
Jon Almazan, Alicia Fornes and Ernest Valveny. 2013. A Deformable HOG-based Shape Descriptor. 12th International Conference on Document Analysis and Recognition.1022–1026.
Abstract: In this paper we deal with the problem of recognizing handwritten shapes. We present a new deformable feature extraction method that adapts to the shape to be described, dealing in this way with the variability introduced in the handwriting domain. It consists in a selection of the regions that best define the shape to be described, followed by the computation of histograms of oriented gradients-based features over these points. Our results significantly outperform other descriptors in the literature for the task of hand-drawn shape recognition and handwritten word retrieval
|
|
|
Francisco Alvaro, Francisco Cruz, Joan Andreu Sanchez, Oriol Ramos Terrades and Jose Miguel Benedi. 2015. Structure Detection and Segmentation of Documents Using 2D Stochastic Context-Free Grammars. NEUCOM, 150(A), 147–154.
Abstract: In this paper we dene a bidimensional extension of Stochastic Context-Free Grammars for structure detection and segmentation of images of documents.
Two sets of text classication features are used to perform an initial classication of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of historical marriage license books to validate this approach. We also tested several inference algorithms for Probabilistic Graphical Models
and the results showed that the proposed grammatical model outperformed
the other methods. Furthermore, grammars also provide the document structure
along with its segmentation.
Keywords: document image analysis; stochastic context-free grammars; text classication features
|
|
|
Marçal Rusiñol, J. Chazalon, Jean-Marc Ogier and Josep Llados. 2015. A Comparative Study of Local Detectors and Descriptors for Mobile Document Classification. 13th International Conference on Document Analysis and Recognition ICDAR2015.596–600.
Abstract: In this paper we conduct a comparative study of local key-point detectors and local descriptors for the specific task of mobile document classification. A classification architecture based on direct matching of local descriptors is used as baseline for the comparative study. A set of four different key-point
detectors and four different local descriptors are tested in all the possible combinations. The experiments are conducted in a database consisting of 30 model documents acquired on 6 different backgrounds, totaling more than 36.000 test images.
|
|
|
Hongxing Gao and 6 others. 2013. Key-region detection for document images -applications to administrative document retrieval. 12th International Conference on Document Analysis and Recognition.230–234.
Abstract: In this paper we argue that a key-region detector designed to take into account the special characteristics of document images can result in the detection of less and more meaningful key-regions. We propose a fast key-region detector able to capture aspects of the structural information of the document, and demonstrate its efficiency by comparing against standard detectors in an administrative document retrieval scenario. We show that using the proposed detector results to a smaller number of detected key-regions and higher performance without any drop in speed compared to standard state of the art detectors.
|
|
|
Dena Bazazian, Dimosthenis Karatzas and Andrew Bagdanov. 2018. Word Spotting in Scene Images based on Character Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.1872–1874.
Abstract: In this paper we address the problem of unconstrained Word Spotting in scene images. We train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach and, via a rectangle classifier, detect the most likely rectangle for each query word based on the character attribute maps. We evaluate the proposed method on ICDAR2015 and show that it is capable of identifying and recognizing query words in natural scene images.
|
|
|
Ernest Valveny, Oriol Ramos Terrades, Joan Mas and Marçal Rusiñol. 2013. Interactive Document Retrieval and Classification. In Angel Sappa and Jordi Vitria, eds. Multimodal Interaction in Image and Video Applications. Springer Berlin Heidelberg, 17–30.
Abstract: In this chapter we describe a system for document retrieval and classification following the interactive-predictive framework. In particular, the system addresses two different scenarios of document analysis: document classification based on visual appearance and logo detection. These two classical problems of document analysis are formulated following the interactive-predictive model, taking the user interaction into account to make easier the process of annotating and labelling the documents. A system implementing this model in a real scenario is presented and analyzed. This system also takes advantage of active learning techniques to speed up the task of labelling the documents.
|
|
|
Albert Gordo, Jose Antonio Rodriguez, Florent Perronnin and Ernest Valveny. 2012. Leveraging category-level labels for instance-level image retrieval. 25th IEEE Conference on Computer Vision and Pattern Recognition. IEEE Xplore, 3045–3052.
Abstract: In this article, we focus on the problem of large-scale instance-level image retrieval. For efficiency reasons, it is common to represent an image by a fixed-length descriptor which is subsequently encoded into a small number of bits. We note that most encoding techniques include an unsupervised dimensionality reduction step. Our goal in this work is to learn a better subspace in a supervised manner. We especially raise the following question: “can category-level labels be used to learn such a subspace?” To answer this question, we experiment with four learning techniques: the first one is based on a metric learning framework, the second one on attribute representations, the third one on Canonical Correlation Analysis (CCA) and the fourth one on Joint Subspace and Classifier Learning (JSCL). While the first three approaches have been applied in the past to the image retrieval problem, we believe we are the first to show the usefulness of JSCL in this context. In our experiments, we use ImageNet as a source of category-level labels and report retrieval results on two standard dataseis: INRIA Holidays and the University of Kentucky benchmark. Our experimental study shows that metric learning and attributes do not lead to any significant improvement in retrieval accuracy, as opposed to CCA and JSCL. As an example, we report on Holidays an increase in accuracy from 39.3% to 48.6% with 32-dimensional representations. Overall JSCL is shown to yield the best results.
|
|
|
Sebastien Mace, Herve Locteau, Ernest Valveny and Salvatore Tabbone. 2010. A system to detect rooms in architectural floor plan images. 9th IAPR International Workshop on Document Analysis Systems.167–174.
Abstract: In this article, a system to detect rooms in architectural floor plan images is described. We first present a primitive extraction algorithm for line detection. It is based on an original coupling of classical Hough transform with image vectorization in order to perform robust and efficient line detection. We show how the lines that satisfy some graphical arrangements are combined into walls. We also present the way we detect some door hypothesis thanks to the extraction of arcs. Walls and door hypothesis are then used by our room segmentation strategy; it consists in recursively decomposing the image until getting nearly convex regions. The notion of convexity is difficult to quantify, and the selection of separation lines between regions can also be rough. We take advantage of knowledge associated to architectural floor plans in order to obtain mostly rectangular rooms. Qualitative and quantitative evaluations performed on a corpus of real documents show promising results.
|
|