|
Adria Molina, Lluis Gomez, Oriol Ramos Terrades and Josep Llados. 2022. A Generic Image Retrieval Method for Date Estimation of Historical Document Collections. Document Analysis Systems.15th IAPR International Workshop, (DAS2022).583–597.
Abstract: Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images.
Keywords: Date estimation; Document retrieval; Image retrieval; Ranking loss; Smooth-nDCG
|
|
|
Josep Brugues Pujolras, Lluis Gomez and Dimosthenis Karatzas. 2022. A Multilingual Approach to Scene Text Visual Question Answering. Document Analysis Systems.15th IAPR International Workshop, (DAS2022).65–79.
Abstract: Scene Text Visual Question Answering (ST-VQA) has recently emerged as a hot research topic in Computer Vision. Current ST-VQA models have a big potential for many types of applications but lack the ability to perform well on more than one language at a time due to the lack of multilingual data, as well as the use of monolingual word embeddings for training. In this work, we explore the possibility to obtain bilingual and multilingual VQA models. In that regard, we use an already established VQA model that uses monolingual word embeddings as part of its pipeline and substitute them by FastText and BPEmb multilingual word embeddings that have been aligned to English. Our experiments demonstrate that it is possible to obtain bilingual and multilingual VQA models with a minimal loss in performance in languages not used during training, as well as a multilingual model trained in multiple languages that match the performance of the respective monolingual baselines.
Keywords: Scene text; Visual question answering; Multilingual word embeddings; Vision and language; Deep learning
|
|
|
Oriol Ramos Terrades, Albert Berenguel and Debora Gil. 2022. A Flexible Outlier Detector Based on a Topology Given by Graph Communities. BDR, 29, 100332.
Abstract: Outlier detection is essential for optimal performance of machine learning methods and statistical predictive models. Their detection is especially determinant in small sample size unbalanced problems, since in such settings outliers become highly influential and significantly bias models. This particular experimental settings are usual in medical applications, like diagnosis of rare pathologies, outcome of experimental personalized treatments or pandemic emergencies. In contrast to population-based methods, neighborhood based local approaches compute an outlier score from the neighbors of each sample, are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. A main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters, like the number of neighbors.
This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world and synthetic data sets show that our approach outperforms, both, local and global strategies in multi and single view settings.
Keywords: Classification algorithms; Detection algorithms; Description of feature space local structure; Graph communities; Machine learning algorithms; Outlier detectors
|
|
|
Josep Llados, Daniel Lopresti and Seiichi Uchida, eds. 2021. 16th International Conference, 2021, Proceedings, Part I. Springer Cham. (LNCS.)
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: historical document analysis, document analysis systems, handwriting recognition, scene text detection and recognition, document image processing, natural language processing (NLP) for document understanding, and graphics, diagram and math recognition.
|
|
|
Josep Llados, Daniel Lopresti and Seiichi Uchida, eds. 2021. 16th International Conference, 2021, Proceedings, Part II. Springer Cham. (LNCS.)
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
|
|
|
Josep Llados, Daniel Lopresti and Seiichi Uchida, eds. 2021. 16th International Conference, 2021, Proceedings, Part III. Springer Cham. (LNCS.)
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
|
|
|
Josep Llados, Daniel Lopresti and Seiichi Uchida, eds. 2021. 16th International Conference, 2021, Proceedings, Part IV. Springer Cham. (LNCS.)
Abstract: This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
|
|
|
Pau Riba, Lutz Goldmann, Oriol Ramos Terrades, Diede Rusticus, Alicia Fornes and Josep Llados. 2022. Table detection in business document images by message passing networks. PR, 127, 108641.
Abstract: Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.
|
|
|
Mohamed Ali Souibgui and 6 others. 2022. DocEnTr: An End-to-End Document Image Enhancement Transformer. 26th International Conference on Pattern Recognition.1699–1705.
Abstract: Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR
Keywords: Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads
|
|
|
Giacomo Magnifico, Beata Megyesi, Mohamed Ali Souibgui, Jialuo Chen and Alicia Fornes. 2022. Lost in Transcription of Graphic Signs in Ciphers. International Conference on Historical Cryptology (HistoCrypt 2022).153–158.
Abstract: Hand-written Text Recognition techniques with the aim to automatically identify and transcribe hand-written text have been applied to historical sources including ciphers. In this paper, we compare the performance of two machine learning architectures, an unsupervised method based on clustering and a deep learning method with few-shot learning. Both models are tested on seen and unseen data from historical ciphers with different symbol sets consisting of various types of graphic signs. We compare the models and highlight their differences in performance, with their advantages and shortcomings.
Keywords: transcription of ciphers; hand-written text recognition of symbols; graphic signs
|
|