|
Ernest Valveny, Robert Benavente, Agata Lapedriza, Miquel Ferrer, Jaume Garcia, & Gemma Sanchez. (2012). Adaptation of a computer programming course to the EXHE requirements: evaluation five years later (Vol. 37).
|
|
|
Christophe Rigaud, Dimosthenis Karatzas, Joost Van de Weijer, Jean-Christophe Burie, & Jean-Marc Ogier. (2013). Automatic text localisation in scanned comic books. In Proceedings of the International Conference on Computer Vision Theory and Applications (pp. 814–819).
Abstract: Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent document understanding enable direct content-based search as opposed to metadata only search (e.g. album title or author name). Few studies have been done in this direction. In this work we detail a novel approach for the automatic text localization in scanned comics book pages, an essential step towards a fully automatic comics book understanding. We focus on speech text as it is semantically important and represents the majority of the text present in comics. The approach is compared with existing methods of text localization found in the literature and results are presented.
Keywords: Text localization; comics; text/graphic separation; complex background; unstructured document
|
|
|
Christophe Rigaud, Dimosthenis Karatzas, Joost Van de Weijer, Jean-Christophe Burie, & Jean-Marc Ogier. (2013). An active contour model for speech balloon detection in comics. In 12th International Conference on Document Analysis and Recognition (pp. 1240–1244).
Abstract: Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent comic book understanding would enable a variety of new applications, including content-based retrieval and content retargeting. Document understanding in this domain is challenging as comics are semi-structured documents, combining semantically important graphical and textual parts. Few studies have been done in this direction. In this work we detail a novel approach for closed and non-closed speech balloon localization in scanned comic book pages, an essential step towards a fully automatic comic book understanding. The approach is compared with existing methods for closed balloon localization found in the literature and results are presented.
|
|
|
Alicia Fornes, Josep Llados, Gemma Sanchez, Xavier Otazu, & Horst Bunke. (2010). A Combination of Features for Symbol-Independent Writer Identification in Old Music Scores. IJDAR - International Journal on Document Analysis and Recognition, 13(4), 243–259.
Abstract: The aim of writer identification is determining the writer of a piece of handwriting from a set of writers. In this paper, we present an architecture for writer identification in old handwritten music scores. Even though an important amount of music compositions contain handwritten text, the aim of our work is to use only music notation to determine the author. The main contribution is therefore the use of features extracted from graphical alphabets. Our proposal consists in combining the identification results of two different approaches, based on line and textural features. The steps of the ensemble architecture are the following. First of all, the music sheet is preprocessed for removing the staff lines. Then, music lines and texture images are generated for computing line features and textural features. Finally, the classification results are combined for identifying the writer. The proposed method has been tested on a database of old music scores from the seventeenth to nineteenth centuries, achieving a recognition rate of about 92% with 20 writers.
|
|
|
Youssef El Rhabi, Simon Loic, Brun Luc, Josep Llados, & Felipe Lumbreras. (2016). Information Theoretic Rotationwise Robust Binary Descriptor Learning. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 368–378).
Abstract: In this paper, we propose a new data-driven approach for binary descriptor selection. In order to draw a clear analysis of common designs, we present a general information-theoretic selection paradigm. It encompasses several standard binary descriptor construction schemes, including a recent state-of-the-art one named BOLD. We pursue the same endeavor to increase the stability of the produced descriptors with respect to rotations. To achieve this goal, we have designed a novel offline selection criterion which is better adapted to the online matching procedure. The effectiveness of our approach is demonstrated on two standard datasets, where our descriptor is compared to BOLD and to several classical descriptors. In particular, it emerges that our approach can reproduce equivalent if not better performance as BOLD while relying on twice shorter descriptors. Such an improvement can be influential for real-time applications.
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, Aura Hernandez-Sabate, Marçal Rusiñol, & Francesc J. Ferri. (2018). Fast Kernel Generalized Discriminative Common Vectors for Feature Extraction. JMIV - Journal of Mathematical Imaging and Vision, 60(4), 512–524.
Abstract: This paper presents a supervised subspace learning method called Kernel Generalized Discriminative Common Vectors (KGDCV), as a novel extension of the known Discriminative Common Vectors method with Kernels. Our method combines the advantages of kernel methods to model complex data and solve nonlinear
problems with moderate computational complexity, with the better generalization properties of generalized approaches for large dimensional data. These attractive combination makes KGDCV specially suited for feature extraction and classification in computer vision, image processing and pattern recognition applications. Two different approaches to this generalization are proposed, a first one based on the kernel trick (KT) and a second one based on the nonlinear projection trick (NPT) for even higher efficiency. Both methodologies
have been validated on four different image datasets containing faces, objects and handwritten digits, and compared against well known non-linear state-of-art methods. Results show better discriminant properties than other generalized approaches both linear or kernel. In addition, the KGDCV-NPT approach presents a considerable computational gain, without compromising the accuracy of the model.
|
|
|
Marçal Rusiñol, J. Chazalon, & Katerine Diaz. (2018). Augmented Songbook: an Augmented Reality Educational Application for Raising Music Awareness. MTAP - Multimedia Tools and Applications, 77(11), 13773–13798.
Abstract: This paper presents the development of an Augmented Reality mobile application which aims at sensibilizing young children to abstract concepts of music. Such concepts are, for instance, the musical notation or the idea of rhythm. Recent studies in Augmented Reality for education suggest that such technologies have multiple benefits for students, including younger ones. As mobile document image acquisition and processing gains maturity on mobile platforms, we explore how it is possible to build a markerless and real-time application to augment the physical documents with didactic animations and interactive virtual content. Given a standard image processing pipeline, we compare the performance of different local descriptors at two key stages of the process. Results suggest alternatives to the SIFT local descriptors, regarding result quality and computational efficiency, both for document model identification and perspective transform estimation. All experiments are performed on an original and public dataset we introduce here.
Keywords: Augmented reality; Document image matching; Educational applications
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, Marçal Rusiñol, & Aura Hernandez-Sabate. (2019). Feature Extraction by Using Dual-Generalized Discriminative Common Vectors. JMIV - Journal of Mathematical Imaging and Vision, 61(3), 331–351.
Abstract: In this paper, a dual online subspace-based learning method called dual-generalized discriminative common vectors (Dual-GDCV) is presented. The method extends incremental GDCV by exploiting simultaneously both the concepts of incremental and decremental learning for supervised feature extraction and classification. Our methodology is able to update the feature representation space without recalculating the full projection or accessing the previously processed training data. It allows both adding information and removing unnecessary data from a knowledge base in an efficient way, while retaining the previously acquired knowledge. The proposed method has been theoretically proved and empirically validated in six standard face recognition and classification datasets, under two scenarios: (1) removing and adding samples of existent classes, and (2) removing and adding new classes to a classification problem. Results show a considerable computational gain without compromising the accuracy of the model in comparison with both batch methodologies and other state-of-art adaptive methods.
Keywords: Online feature extraction; Generalized discriminative common vectors; Dual learning; Incremental learning; Decremental learning
|
|
|
Marçal Rusiñol, David Aldavert, Ricardo Toledo, & Josep Llados. (2015). Efficient segmentation-free keyword spotting in historical document collections. PR - Pattern Recognition, 48(2), 545–555.
Abstract: In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches.
Keywords: Historical documents; Keyword spotting; Segmentation-free; Dense SIFT features; Latent semantic analysis; Product quantization
|
|
|
Lluis Pere de las Heras, Ahmed Sheraz, Marcus Liwicki, Ernest Valveny, & Gemma Sanchez. (2014). Statistical Segmentation and Structural Recognition for Floor Plan Interpretation. IJDAR - International Journal on Document Analysis and Recognition, 17(3), 221–237.
Abstract: A generic method for floor plan analysis and interpretation is presented in this article. The method, which is mainly inspired by the way engineers draw and interpret floor plans, applies two recognition steps in a bottom-up manner. First, basic building blocks, i.e., walls, doors, and windows are detected using a statistical patch-based segmentation approach. Second, a graph is generated, and structural pattern recognition techniques are applied to further locate the main entities, i.e., rooms of the building. The proposed approach is able to analyze any type of floor plan regardless of the notation used. We have evaluated our method on different publicly available datasets of real architectural floor plans with different notations. The overall detection and recognition accuracy is about 95 %, which is significantly better than any other state-of-the-art method. Our approach is generic enough such that it could be easily adopted to the recognition and interpretation of any other printed machine-generated structured documents.
|
|
|
Alicia Fornes, & Gemma Sanchez. (2014). Analysis and Recognition of Music Scores. In D. Doermann, & K. Tombre (Eds.), Handbook of Document Image Processing and Recognition (Vol. E, pp. 749–774). Springer London.
Abstract: The analysis and recognition of music scores has attracted the interest of researchers for decades. Optical Music Recognition (OMR) is a classical research field of Document Image Analysis and Recognition (DIAR), whose aim is to extract information from music scores. Music scores contain both graphical and textual information, and for this reason, techniques are closely related to graphics recognition and text recognition. Since music scores use a particular diagrammatic notation that follow the rules of music theory, many approaches make use of context information to guide the recognition and solve ambiguities. This chapter overviews the main Optical Music Recognition (OMR) approaches. Firstly, the different methods are grouped according to the OMR stages, namely, staff removal, music symbol recognition, and syntactical analysis. Secondly, specific approaches for old and handwritten music scores are reviewed. Finally, online approaches and commercial systems are also commented.
|
|
|
Lluis Pere de las Heras, Ernest Valveny, & Gemma Sanchez. (2014). Unsupervised and Notation-Independent Wall Segmentation in Floor Plans Using a Combination of Statistical and Structural Strategies. In Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 109–121). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we present a wall segmentation approach in floor plans that is able to work independently to the graphical notation, does not need any pre-annotated data for learning, and is able to segment multiple-shaped walls such as beams and curved-walls. This method results from the combination of the wall segmentation approaches [3, 5] presented recently by the authors. Firstly, potential straight wall segments are extracted in an unsupervised way similar to [3], but restricting even more the wall candidates considered in the original approach. Then, based on [5], these segments are used to learn the texture pattern of walls and spot the lost instances. The presented combination of both methods has been tested on 4 available datasets with different notations and compared qualitatively and quantitatively to the state-of-the-art applied on these collections. Additionally, some qualitative results on floor plans directly downloaded from the Internet are reported in the paper. The overall performance of the method demonstrates either its adaptability to different wall notations and shapes, and to document qualities and resolutions.
Keywords: Graphics recognition; Floor plan analysis; Object segmentation
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, Sergi Robles, & Gemma Sanchez. (2015). CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR - International Journal on Document Analysis and Recognition, 18(1), 15–30.
Abstract: Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.
|
|
|
David Aldavert, Marçal Rusiñol, Ricardo Toledo, & Josep Llados. (2015). A Study of Bag-of-Visual-Words Representations for Handwritten Keyword Spotting. IJDAR - International Journal on Document Analysis and Recognition, 18(3), 223–234.
Abstract: The Bag-of-Visual-Words (BoVW) framework has gained popularity among the document image analysis community, specifically as a representation of handwritten words for recognition or spotting purposes. Although in the computer vision field the BoVW method has been greatly improved, most of the approaches in the document image analysis domain still rely on the basic implementation of the BoVW method disregarding such latest refinements. In this paper, we present a review of those improvements and its application to the keyword spotting task. We thoroughly evaluate their impact against a baseline system in the well-known George Washington dataset and compare the obtained results against nine state-of-the-art keyword spotting methods. In addition, we also compare both the baseline and improved systems with the methods presented at the Handwritten Keyword Spotting Competition 2014.
Keywords: Bag-of-Visual-Words; Keyword spotting; Handwritten documents; Performance evaluation
|
|
|
Lluis Pere de las Heras, David Fernandez, Alicia Fornes, Ernest Valveny, Gemma Sanchez, & Josep Llados. (2014). Runlength Histogram Image Signature for Perceptual Retrieval of Architectural Floor Plans. In Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 135–146). LNCS. Springer Berlin Heidelberg.
Abstract: This paper proposes a runlength histogram signature as a perceptual descriptor of architectural plans in a retrieval scenario. The style of an architectural drawing is characterized by the perception of lines, shapes and texture. Such visual stimuli are the basis for defining semantic concepts as space properties, symmetry, density, etc. We propose runlength histograms extracted in vertical, horizontal and diagonal directions as a characterization of line and space properties in floorplans, so it can be roughly associated to a description of walls and room structure. A retrieval application illustrates the performance of the proposed approach, where given a plan as a query, similar ones are obtained from a database. A ground truth based on human observation has been constructed to validate the hypothesis. Additional retrieval results on sketched building’s facades are reported qualitatively in this paper. Its good description and its adaptability to two different sketch drawings despite its simplicity shows the interest of the proposed approach and opens a challenging research line in graphics recognition.
Keywords: Graphics recognition; Graphics retrieval; Image classification
|
|