|
Marçal Rusiñol and Josep Llados. 2008. Word and Symbol Spotting using Spatial Organization of Local Descriptors. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,.489–496.
|
|
|
T.O. Nguyen, Salvatore Tabbone and Oriol Ramos Terrades. 2008. Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,.191–197.
|
|
|
Mathieu Nicolas Delalandre, Ernest Valveny and Josep Llados. 2008. Performance Evaluation of Symbol Recognition and Spotting Systems. Proceedings of the 8th International Workshop on Document Analysis Systems,.497–505.
|
|
|
Joan Mas, Jose Antonio Rodriguez, Dimosthenis Karatzas, Gemma Sanchez and Josep Llados. 2008. HistoSketch: A Semi-Automatic Annotation Tool for Archival Documents. Proceedings of the 8th International Workshop on Document Analysis Systems,.517–524.
|
|
|
Dimosthenis Karatzas. 2008. Detecting Gradients in Text Images Using the Hough Transform. Proceedings of the 8th International Workshop on Document Analysis Systems,.245–252.
|
|
|
Alicia Fornes, Josep Llados, Gemma Sanchez and Horst Bunke. 2008. Writer Identification in Old Handwritten Music Scores. Proceedings of the 8th International Workshop on Document Analysis Systems,.347–353.
|
|
|
Marçal Rusiñol, Dimosthenis Karatzas and Josep Llados. 2015. Automatic Verification of Properly Signed Multi-page Document Images. Proceedings of the Eleventh International Symposium on Visual Computing.327–336. (LNCS 9475.)
Abstract: In this paper we present an industrial application for the automatic screening of incoming multi-page documents in a banking workflow aimed at determining whether these documents are properly signed or not. The proposed method is divided in three main steps. First individual pages are classified in order to identify the pages that should contain a signature. In a second step, we segment within those key pages the location where the signatures should appear. The last step checks whether the signatures are present or not. Our method is tested in a real large-scale environment and we report the results when checking two different types of real multi-page contracts, having in total more than 14,500 pages.
Keywords: Document Image; Manual Inspection; Signature Verification; Rejection Criterion; Document Flow
|
|
|
Ernest Valveny and Enric Marti. 1999. Application of deformable template matching to symbol recognition in hand-written architectural draw. Proceedings of the Fifth International Conference on. Bangalore (India).
Abstract: We propose to use deformable template matching as a new approach to recognize characters and lineal symbols in hand-written line drawings, instead of traditional methods based on vectorization and feature extraction. Bayesian formulation of the deformable template matching allows combining fidelity to the ideal shape of the symbol with maximum flexibility to get the best fit to the input image. Lineal nature of symbols can be exploited to define a suitable representation of models and the set of deformations to be applied to them. Matching, however, is done over the original binary image to avoid losing relevant features during vectorization. We have applied this method to hand-written architectural drawings and experimental results demonstrate that symbols with high distortions from ideal shape can be accurately identified.
|
|
|
Oriol Ramos Terrades and Ernest Valveny. 2003. Indexing Technical Symbols Using Ridgelets Transform.
|
|
|
Soumya Jahagirdar, Minesh Mathew, Dimosthenis Karatzas and CV Jawahar. 2023. Understanding Video Scenes Through Text: Insights from Text-Based Video Question Answering. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.
Abstract: Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively. Particularly, comprehending text in videos holds great significance, requiring both scene text understanding and temporal reasoning. This paper focuses on exploring two recently introduced datasets, NewsVideoQA and M4-ViteVQA, which aim to address video question answering based on textual content. The NewsVideoQA dataset contains question-answer pairs related to the text in news videos, while M4- ViteVQA comprises question-answer pairs from diverse categories like vlogging, traveling, and shopping. We provide an analysis of the formulation of these datasets on various levels, exploring the degree of visual understanding and multi-frame comprehension required for answering the questions. Additionally, the study includes experimentation with BERT-QA, a text-only model, which demonstrates comparable performance to the original methods on both datasets, indicating the shortcomings in the formulation of these datasets. Furthermore, we also look into the domain adaptation aspect by examining the effectiveness of training on M4-ViteVQA and evaluating on NewsVideoQA and vice-versa, thereby shedding light on the challenges and potential benefits of out-of-domain training.
|
|