|
Marçal Rusiñol, Dimosthenis Karatzas and Josep Llados. 2015. Automatic Verification of Properly Signed Multi-page Document Images. Proceedings of the Eleventh International Symposium on Visual Computing.327–336. (LNCS 9475.)
Abstract: In this paper we present an industrial application for the automatic screening of incoming multi-page documents in a banking workflow aimed at determining whether these documents are properly signed or not. The proposed method is divided in three main steps. First individual pages are classified in order to identify the pages that should contain a signature. In a second step, we segment within those key pages the location where the signatures should appear. The last step checks whether the signatures are present or not. Our method is tested in a real large-scale environment and we report the results when checking two different types of real multi-page contracts, having in total more than 14,500 pages.
Keywords: Document Image; Manual Inspection; Signature Verification; Rejection Criterion; Document Flow
|
|
|
Mathieu Nicolas Delalandre, Ernest Valveny and Josep Llados. 2008. Performance Evaluation of Symbol Recognition and Spotting Systems. Proceedings of the 8th International Workshop on Document Analysis Systems,.497–505.
|
|
|
Joan Mas, Jose Antonio Rodriguez, Dimosthenis Karatzas, Gemma Sanchez and Josep Llados. 2008. HistoSketch: A Semi-Automatic Annotation Tool for Archival Documents. Proceedings of the 8th International Workshop on Document Analysis Systems,.517–524.
|
|
|
Dimosthenis Karatzas. 2008. Detecting Gradients in Text Images Using the Hough Transform. Proceedings of the 8th International Workshop on Document Analysis Systems,.245–252.
|
|
|
Alicia Fornes, Josep Llados, Gemma Sanchez and Horst Bunke. 2008. Writer Identification in Old Handwritten Music Scores. Proceedings of the 8th International Workshop on Document Analysis Systems,.347–353.
|
|
|
Partha Pratim Roy, Umapada Pal and Josep Llados. 2008. Multi-oriented English Text Line Extraction using Background and Foreground Information. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,.315–322.
|
|
|
Marçal Rusiñol and Josep Llados. 2008. Word and Symbol Spotting using Spatial Organization of Local Descriptors. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,.489–496.
|
|
|
T.O. Nguyen, Salvatore Tabbone and Oriol Ramos Terrades. 2008. Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,.191–197.
|
|
|
Mohamed Ali Souibgui and 8 others. 2023. Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement. Proceedings of the 37th AAAI Conference on Artificial Intelligence.
Abstract: In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR
Keywords: Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning
|
|
|
Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez and Dimosthenis Karatzas. 2023. Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia. Proceedings of the 37th AAAI Conference on Artificial Intelligence.1940–1948.
Abstract: Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.
|
|