|
Ayan Banerjee, Sanket Biswas, Josep Llados and Umapada Pal. 2024. SemiDocSeg: Harnessing Semi-Supervised Learning for Document Layout Analysis. IJDAR.
Abstract: Document Layout Analysis (DLA) is the process of automatically identifying and categorizing the structural components (e.g. Text, Figure, Table, etc.) within a document to extract meaningful content and establish the page's layout structure. It is a crucial stage in document parsing, contributing to their comprehension. However, traditional DLA approaches often demand a significant volume of labeled training data, and the labor-intensive task of generating high-quality annotated training data poses a substantial challenge. In order to address this challenge, we proposed a semi-supervised setting that aims to perform learning on limited annotated categories by eliminating exhaustive and expensive mask annotations. The proposed setting is expected to be generalizable to novel categories as it learns the underlying positional information through a support set and class information through Co-Occurrence that can be generalized from annotated categories to novel categories. Here, we first extract features from the input image and support set with a shared multi-scale feature acquisition backbone. Then, the extracted feature representation is fed to the transformer encoder as a query. Later on, we utilize a semantic embedding network before the decoder to capture the underlying semantic relationships and similarities between different instances, enabling the model to make accurate predictions or classifications with only a limited amount of labeled data. Extensive experimentation on competitive benchmarks like PRIMA, DocLayNet, and Historical Japanese (HJ) demonstrate that this generalized setup obtains significant performance compared to the conventional supervised approach.
Keywords: Document layout analysis; Semi-supervised learning; Co-Occurrence matrix; Instance segmentation; Swin transformer
|
|
|
Josep Llados and Gemma Sanchez. 2004. Graph Matching vs. Graph Parsing in Graphics Recognition: A Combined Approach.
|
|
|
Jaume Gibert, Ernest Valveny and Horst Bunke. 2013. Embedding of Graphs with Discrete Attributes Via Label Frequencies. IJPRAI, 27(3), 1360002–1360029.
Abstract: Graph-based representations of patterns are very flexible and powerful, but they are not easily processed due to the lack of learning algorithms in the domain of graphs. Embedding a graph into a vector space solves this problem since graphs are turned into feature vectors and thus all the statistical learning machinery becomes available for graph input patterns. In this work we present a new way of embedding discrete attributed graphs into vector spaces using node and edge label frequencies. The methodology is experimentally tested on graph classification problems, using patterns of different nature, and it is shown to be competitive to state-of-the-art classification algorithms for graphs, while being computationally much more efficient.
Keywords: Discrete attributed graphs; graph embedding; graph classification
|
|
|
Josep Llados, Marçal Rusiñol, Alicia Fornes, David Fernandez and Anjan Dutta. 2012. On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents. IJPRAI, 26(5), 1263002–126027.
Abstract: 0,624 JCR
Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. In this paper we evaluate the performance of different word descriptors to assess the advantages and disadvantages of statistical and structural models in a framework of query-by-example word spotting in historical documents. We compare four word representation models, namely sequence alignment using DTW as a baseline reference, a bag of visual words approach as statistical model, a pseudo-structural model based on a Loci features representation, and a structural approach where words are represented by graphs. The four approaches have been tested with two collections of historical data: the George Washington database and the marriage records from the Barcelona Cathedral. We experimentally demonstrate that statistical representations generally give a better performance, however it cannot be neglected that large descriptors are difficult to be implemented in a retrieval scenario where word spotting requires the indexation of data with million word images.
Keywords: Handwriting recognition; word spotting; historical documents; feature representation; shape descriptors Read More: http://www.worldscientific.com/doi/abs/10.1142/S0218001412630025
|
|
|
Giacomo Magnifico, Beata Megyesi, Mohamed Ali Souibgui, Jialuo Chen and Alicia Fornes. 2022. Lost in Transcription of Graphic Signs in Ciphers. International Conference on Historical Cryptology (HistoCrypt 2022).153–158.
Abstract: Hand-written Text Recognition techniques with the aim to automatically identify and transcribe hand-written text have been applied to historical sources including ciphers. In this paper, we compare the performance of two machine learning architectures, an unsupervised method based on clustering and a deep learning method with few-shot learning. Both models are tested on seen and unseen data from historical ciphers with different symbol sets consisting of various types of graphic signs. We compare the models and highlight their differences in performance, with their advantages and shortcomings.
Keywords: transcription of ciphers; hand-written text recognition of symbols; graphic signs
|
|
|
Partha Pratim Roy, Umapada Pal and Josep Llados. 2008. Morphology Based Handwritten Line Segmentation using Foreground and Background Information. International Conference on Frontiers in Handwriting Recognition,.241–246.
|
|
|
Helena Muñoz, Fernando Vilariño and Dimosthenis Karatzas. 2019. Eye-Movements During Information Extraction from Administrative Documents. International Conference on Document Analysis and Recognition Workshops.6–9.
Abstract: A key aspect of digital mailroom processes is the extraction of relevant information from administrative documents. More often than not, the extraction process cannot be fully automated, and there is instead an important amount of manual intervention. In this work we study the human process of information extraction from invoice document images. We explore whether the gaze of human annotators during an manual information extraction process could be exploited towards reducing the manual effort and automating the process. To this end, we perform an eye-tracking experiment replicating real-life interfaces for information extraction. Through this pilot study we demonstrate that relevant areas in the document can be identified reliably through automatic fixation classification, and the obtained models generalize well to new subjects. Our findings indicate that it is in principle possible to integrate the human in the document image analysis loop, making use of the scanpath to automate the extraction process or verify extracted information.
|
|
|
Partha Pratim Roy, Josep Llados and Umapada Pal. 2007. Text/Graphics Separation in Color Maps. International Conference on Computing: Theory and Applications.545–551.
|
|
|
Stepan Simsa and 10 others. 2023. Overview of DocILE 2023: Document Information Localization and Extraction. International Conference of the Cross-Language Evaluation Forum for European Languages.276–293. (LNCS.)
Abstract: This paper provides an overview of the DocILE 2023 Competition, its tasks, participant submissions, the competition results and possible future research directions. This first edition of the competition focused on two Information Extraction tasks, Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR). Both of these tasks require detection of pre-defined categories of information in business documents. The second task additionally requires correctly grouping the information into tuples, capturing the structure laid out in the document. The competition used the recently published DocILE dataset and benchmark that stays open to new submissions. The diversity of the participant solutions indicates the potential of the dataset as the submissions included pure Computer Vision, pure Natural Language Processing, as well as multi-modal solutions and utilized all of the parts of the dataset, including the annotated, synthetic and unlabeled subsets.
Keywords: Information Extraction; Computer Vision; Natural Language Processing; Optical Character Recognition; Document Understanding
|
|
|
Josep Llados, Horst Bunke and Enric Marti. 1997. Using Cyclic String Matching to Find Rotational and Reflectional Symmetries in Shapes. Intelligent Robots: Sensing, Modeling and Planning. World Scientific Press, 164–179.
Abstract: Dagstuhl Workshop
|
|