|
Helena Muñoz, Fernando Vilariño and Dimosthenis Karatzas. 2019. Eye-Movements During Information Extraction from Administrative Documents. International Conference on Document Analysis and Recognition Workshops.6–9.
Abstract: A key aspect of digital mailroom processes is the extraction of relevant information from administrative documents. More often than not, the extraction process cannot be fully automated, and there is instead an important amount of manual intervention. In this work we study the human process of information extraction from invoice document images. We explore whether the gaze of human annotators during an manual information extraction process could be exploited towards reducing the manual effort and automating the process. To this end, we perform an eye-tracking experiment replicating real-life interfaces for information extraction. Through this pilot study we demonstrate that relevant areas in the document can be identified reliably through automatic fixation classification, and the obtained models generalize well to new subjects. Our findings indicate that it is in principle possible to integrate the human in the document image analysis loop, making use of the scanpath to automate the extraction process or verify extracted information.
|
|
|
Arnau Baro, Pau Riba and Alicia Fornes. 2018. A Starting Point for Handwritten Music Recognition. 1st International Workshop on Reading Music Systems.5–6.
Abstract: In the last years, the interest in Optical Music Recognition (OMR) has reawakened, especially since the appearance of deep learning. However, there are very few works addressing handwritten scores. In this work we describe a full OMR pipeline for handwritten music scores by using Convolutional and Recurrent Neural Networks that could serve as a baseline for the research community.
Keywords: Optical Music Recognition; Long Short-Term Memory; Convolutional Neural Networks; MUSCIMA++; CVCMUSCIMA
|
|
|
Josep Llados, J. Lopez-Krahe and D. Archambault. 2007. Special Issue on Information Technologies for Visually Impaired People. Guest Editors.
|
|
|
Mathieu Nicolas Delalandre, Jean-Yves Ramel, Ernest Valveny and Muhammad Muzzamil Luqman. 2009. A Performance Characterization Algorithm for Symbol Localization. 8th IAPR International Workshop on Graphics Recognition. Springer, 3–11.
Abstract: In this paper we present an algorithm for performance characterization of symbol localization systems. This algorithm is aimed to be a more “reliable” and “open” solution to characterize the performance. To achieve that, it exploits only single points as the result of localization and offers the possibility to reconsider the localization results provided by a system. We use the information about context in groundtruth, and overall localization results, to detect the ambiguous localization results. A probability score is computed for each matching between a localization point and a groundtruth region, depending on the spatial distribution of the other regions in the groundtruth. Final characterization is given with detection rate/probability score plots, describing the sets of possible interpretations of the localization results, according to a given confidence rate. We present experimentation details along with the results for the symbol localization system of [1], exploiting a synthetic dataset of architectural floorplans and electrical diagrams (composed of 200 images and 3861 symbols).
|
|
|
Marçal Rusiñol, Dimosthenis Karatzas and Josep Llados. 2014. Spotting Graphical Symbols in Camera-Acquired Documents in Real Time. In Bart Lamiroy and Jean-Marc Ogier, eds. Graphics Recognition. Current Trends and Challenges. Springer Berlin Heidelberg, 3–10. (LNCS.)
Abstract: In this paper we present a system devoted to spot graphical symbols in camera-acquired document images. The system is based on the extraction and further matching of ORB compact local features computed over interest key-points. Then, the FLANN indexing framework based on approximate nearest neighbor search allows to efficiently match local descriptors between the captured scene and the graphical models. Finally, the RANSAC algorithm is used in order to compute the homography between the spotted symbol and its appearance in the document image. The proposed approach is efficient and is able to work in real time.
|
|
|
Giuseppe De Gregorio and 6 others. 2022. A Few Shot Multi-representation Approach for N-Gram Spotting in Historical Manuscripts. Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022).3–12. (LNCS.)
Abstract: Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a closed reference vocabulary. In this paper, we propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram) that requires a small amount of labelled training data. We exhibit that recognition of important n-grams could reduce the system’s dependency on vocabulary. In this case, an out-of-vocabulary (OOV) word in an input handwritten line image could be a sequence of n-grams that belong to the lexicon. An extensive experimental evaluation of our proposed multi-representation approach was carried out on a subset of Bentham’s historical manuscript collections to obtain some really promising results in this direction.
Keywords: N-gram spotting; Few-shot learning; Multimodal understanding; Historical handwritten collections
|
|
|
Francesc Net, Marc Folia, Pep Casals and Lluis Gomez. 2023. Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections. 17th International Conference on Document Analysis and Recognition.3–17. (LNCS.)
Abstract: This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario, where a document management company is commissioned to manually annotate a collection of scanned photographs. Detecting duplicate and near-duplicate photographs can reduce the time spent on manual annotation by archivists. This real use case differs from laboratory settings as the deployment dataset is available in advance, allowing the use of transductive learning. We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). Our approach involves pre-training a deep neural network on a large dataset and then fine-tuning the network on the unlabeled target collection with self-supervised learning. The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.
Keywords: Image deduplication; Near-duplicate images detection; Transductive Learning; Photographic Archives; Deep Learning
|
|
|
Josep Llados and Dorothea Blostein. 2007. Special Issue on Graphics Recognition. Guest Editors.
|
|
|
Herve Locteau, Sebastien Mace, Ernest Valveny and Salvatore Tabbone. 2010. Extraction des pieces de un plan de habitation. Colloque Internacional Francophone de l´Ecrit et le Document.1–12.
Abstract: In this article, a method to extract the rooms of an architectural floor plan image is described. We first present a line detection algorithm to extract long lines in the image. Those lines are analyzed to identify the existing walls. From this point, room extraction can be seen as a classical segmentation task for which each region corresponds to a room. The chosen resolution strategy consists in recursively decomposing the image until getting nearly convex regions. The notion of convexity is difficult to quantify, and the selection of separation lines can also be rough. Thus, we take advantage of knowledge associated to architectural floor plans in order to obtain mainly rectangular rooms. Preliminary tests on a set of real documents show promising results.
|
|
|
Josep Llados, Ernest Valveny, Gemma Sanchez and Enric Marti. 2003. A Case Study of Pattern Recognition: Symbol Recognition in Graphic Documentsa. Proceedings of Pattern Recognition in Information Systems. ICEIS Press, 1–13.
|
|