|
Juan Ignacio Toledo, Sebastian Sudholt, Alicia Fornes, Jordi Cucurull, A. Fink and Josep Llados. 2016. Handwritten Word Image Categorization with Convolutional Neural Networks and Spatial Pyramid Pooling. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer International Publishing, 543–552. (LNCS.)
Abstract: The extraction of relevant information from historical document collections is one of the key steps in order to make these documents available for access and searches. The usual approach combines transcription and grammars in order to extract semantically meaningful entities. In this paper, we describe a new method to obtain word categories directly from non-preprocessed handwritten word images. The method can be used to directly extract information, being an alternative to the transcription. Thus it can be used as a first step in any kind of syntactical analysis. The approach is based on Convolutional Neural Networks with a Spatial Pyramid Pooling layer to deal with the different shapes of the input images. We performed the experiments on a historical marriage record dataset, obtaining promising results.
Keywords: Document image analysis; Word image categorization; Convolutional neural networks; Named entity detection
|
|
|
Suman Ghosh and Ernest Valveny. 2015. A Sliding Window Framework for Word Spotting Based on Word Attributes. Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015. Springer International Publishing, 652–661. (LNCS.)
Abstract: In this paper we propose a segmentation-free approach to word spotting. Word images are first encoded into feature vectors using Fisher Vector. Then, these feature vectors are used together with pyramidal histogram of characters labels (PHOC) to learn SVM-based attribute models. Documents are represented by these PHOC based word attributes. To efficiently compute the word attributes over a sliding window, we propose to use an integral image representation of the document using a simplified version of the attribute model. Finally we re-rank the top word candidates using the more discriminative full version of the word attributes. We show state-of-the-art results for segmentation-free query-by-example word spotting in single-writer and multi-writer standard datasets.
Keywords: Word spotting; Sliding window; Word attributes
|
|
|
Pau Riba, Josep Llados, Alicia Fornes and Anjan Dutta. 2015. Large-scale Graph Indexing using Binary Embeddings of Node Contexts. In C.-L.Liu, B.Luo, W.G.Kropatsch and J.Cheng, eds. 10th IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition. Springer International Publishing, 208–217. (LNCS.)
Abstract: Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations in terms of feature vectors. Retrieving a query graph from a large dataset of graphs has the drawback of the high computational complexity required to compare the query and the target graphs. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. In this paper we propose a fast indexation formalism for graph retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Hence, each attribute counts the length of a walk of order k originated in a vertex with label l. Each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in a handwritten word spotting scenario in images of historical documents.
Keywords: Graph matching; Graph indexing; Application in document analysis; Word spotting; Binary embedding
|
|
|
Francesco Brughi, Debora Gil, Llorenç Badiella, Eva Jove Casabella and Oriol Ramos Terrades. 2014. Exploring the impact of inter-query variability on the performance of retrieval systems. 11th International Conference on Image Analysis and Recognition. Springer International Publishing, 413–420. (LNCS.)
Abstract: This paper introduces a framework for evaluating the performance of information retrieval systems. Current evaluation metrics provide an average score that does not consider performance variability across the query set. In this manner, conclusions lack of any statistical significance, yielding poor inference to cases outside the query set and possibly unfair comparisons. We propose to apply statistical methods in order to obtain a more informative measure for problems in which different query classes can be identified. In this context, we assess the performance variability on two levels: overall variability across the whole query set and specific query class-related variability. To this end, we estimate confidence bands for precision-recall curves, and we apply ANOVA in order to assess the significance of the performance across different query classes.
|
|
|
Andrea Gemelli, Sanket Biswas, Enrico Civitelli, Josep Llados and Simone Marinai. 2022. Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks. 17th European Conference on Computer Vision Workshops.329–344. (LNCS.)
Abstract: Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection.
|
|
|
Utkarsh Porwal, Alicia Fornes and Faisal Shafait, eds. 2022. Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition. 18th International Conference, ICFHR 2022. Springer. (LNCS.)
|
|
|
Sergi Garcia Bordils and 6 others. 2022. Read While You Drive-Multilingual Text Tracking on the Road. 15th IAPR International workshop on document analysis systems.756–770. (LNCS.)
Abstract: Visual data obtained during driving scenarios usually contain large amounts of text that conveys semantic information necessary to analyse the urban environment and is integral to the traffic control plan. Yet, research on autonomous driving or driver assistance systems typically ignores this information. To advance research in this direction, we present RoadText-3K, a large driving video dataset with fully annotated text. RoadText-3K is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions and multiple languages and scripts. We offer a comprehensive analysis of tracking by detection and detection by tracking methods exploring the limits of state-of-the-art text detection. Finally, we propose a new end-to-end trainable tracking model that yields state-of-the-art results on this challenging dataset. Our experiments demonstrate the complexity and variability of RoadText-3K and establish a new, realistic benchmark for scene text tracking in the wild.
|
|
|
Josep Llados, Daniel Lopresti and Seiichi Uchida, eds. 2021. 16th International Conference, 2021, Proceedings, Part I. Springer Cham. (LNCS.)
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: historical document analysis, document analysis systems, handwriting recognition, scene text detection and recognition, document image processing, natural language processing (NLP) for document understanding, and graphics, diagram and math recognition.
|
|
|
Josep Llados, Daniel Lopresti and Seiichi Uchida, eds. 2021. 16th International Conference, 2021, Proceedings, Part IV. Springer Cham. (LNCS.)
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
|
|
|
Josep Llados, Daniel Lopresti and Seiichi Uchida, eds. 2021. 16th International Conference, 2021, Proceedings, Part III. Springer Cham. (LNCS.)
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
|
|