|
Herve Locteau, Sebastien Mace, Ernest Valveny and Salvatore Tabbone. 2010. Extraction des pieces de un plan de habitation. Colloque Internacional Francophone de l´Ecrit et le Document.1–12.
Abstract: In this article, a method to extract the rooms of an architectural floor plan image is described. We first present a line detection algorithm to extract long lines in the image. Those lines are analyzed to identify the existing walls. From this point, room extraction can be seen as a classical segmentation task for which each region corresponds to a room. The chosen resolution strategy consists in recursively decomposing the image until getting nearly convex regions. The notion of convexity is difficult to quantify, and the selection of separation lines can also be rough. Thus, we take advantage of knowledge associated to architectural floor plans in order to obtain mainly rectangular rooms. Preliminary tests on a set of real documents show promising results.
|
|
|
Helena Muñoz, Fernando Vilariño and Dimosthenis Karatzas. 2019. Eye-Movements During Information Extraction from Administrative Documents. International Conference on Document Analysis and Recognition Workshops.6–9.
Abstract: A key aspect of digital mailroom processes is the extraction of relevant information from administrative documents. More often than not, the extraction process cannot be fully automated, and there is instead an important amount of manual intervention. In this work we study the human process of information extraction from invoice document images. We explore whether the gaze of human annotators during an manual information extraction process could be exploited towards reducing the manual effort and automating the process. To this end, we perform an eye-tracking experiment replicating real-life interfaces for information extraction. Through this pilot study we demonstrate that relevant areas in the document can be identified reliably through automatic fixation classification, and the obtained models generalize well to new subjects. Our findings indicate that it is in principle possible to integrate the human in the document image analysis loop, making use of the scanpath to automate the extraction process or verify extracted information.
|
|
|
Hana Jarraya, Oriol Ramos Terrades and Josep Llados. 2017. Graph Embedding through Probabilistic Graphical Model applied to Symbolic Graphs. 8th Iberian Conference on Pattern Recognition and Image Analysis.
Abstract: We propose a new Graph Embedding (GEM) method that takes advantages of structural pattern representation. It models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector. This vector is a signature of AG in a lower dimensional vectorial space. We apply Structured Support Vector Machines (SSVM) to process classification task. As first tentative, results on the GREC dataset are encouraging enough to go further on this direction.
Keywords: Attributed Graph; Probabilistic Graphical Model; Graph Embedding; Structured Support Vector Machines
|
|
|
Hana Jarraya, Oriol Ramos Terrades and Josep Llados. 2017. Learning structural loss parameters on graph embedding applied on symbolic graphs. 12th IAPR International Workshop on Graphics Recognition.
Abstract: We propose an amelioration of proposed Graph Embedding (GEM) method in previous work that takes advantages of structural pattern representation and the structured distortion. it models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector, as new signature of AG in a lower dimensional vectorial space. We focus to adapt the structured learning algorithm via 1_slack formulation with a suitable risk function, called Graph Edit Distance (GED). It defines the dissimilarity of the ground truth and predicted graph labels. It determines by the error tolerant graph matching using bipartite graph matching algorithm. We apply Structured Support Vector Machines (SSVM) to process classification task. During our experiments, we got our results on the GREC dataset.
|
|
|
Hana Jarraya, Muhammad Muzzamil Luqman and Jean-Yves Ramel. 2017. Improving Fuzzy Multilevel Graph Embedding Technique by Employing Topological Node Features: An Application to Graphics Recognition. In B. Lamiroy and R Dueire Lins, eds. Graphics Recognition. Current Trends and Challenges. Springer. (LNCS.)
|
|
|
H. Chouaib, Salvatore Tabbone, Oriol Ramos Terrades, F. Cloppet, N. Vincent and A.T. Thierry Paquet. 2008. Sélection de Caractéristiques à partir d'un algorithme génétique et d'une combinaison de classifieurs Adaboost. Colloque International Francophone sur l'Ecrit et le Document.181–186.
|
|
|
H. Chouaib, Oriol Ramos Terrades, Salvatore Tabbone, F. Cloppet and N. Vincent. 2008. Feature Selection Combining Genetic Algorithm and Adaboost Classifiers. 19th International Conference on Pattern Recognition.1–4.
|
|
|
Giuseppe De Gregorio and 6 others. 2022. A Few Shot Multi-representation Approach for N-Gram Spotting in Historical Manuscripts. Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022).3–12. (LNCS.)
Abstract: Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a closed reference vocabulary. In this paper, we propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram) that requires a small amount of labelled training data. We exhibit that recognition of important n-grams could reduce the system’s dependency on vocabulary. In this case, an out-of-vocabulary (OOV) word in an input handwritten line image could be a sequence of n-grams that belong to the lexicon. An extensive experimental evaluation of our proposed multi-representation approach was carried out on a subset of Bentham’s historical manuscript collections to obtain some really promising results in this direction.
Keywords: N-gram spotting; Few-shot learning; Multimodal understanding; Historical handwritten collections
|
|
|
Giacomo Magnifico, Beata Megyesi, Mohamed Ali Souibgui, Jialuo Chen and Alicia Fornes. 2022. Lost in Transcription of Graphic Signs in Ciphers. International Conference on Historical Cryptology (HistoCrypt 2022).153–158.
Abstract: Hand-written Text Recognition techniques with the aim to automatically identify and transcribe hand-written text have been applied to historical sources including ciphers. In this paper, we compare the performance of two machine learning architectures, an unsupervised method based on clustering and a deep learning method with few-shot learning. Both models are tested on seen and unseen data from historical ciphers with different symbol sets consisting of various types of graphic signs. We compare the models and highlight their differences in performance, with their advantages and shortcomings.
Keywords: transcription of ciphers; hand-written text recognition of symbols; graphic signs
|
|
|
George Tom, Minesh Mathew, Sergi Garcia Bordils, Dimosthenis Karatzas and CV Jawahar. 2023. ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition. 17th International Conference on Document Analysis and Recognition.577–586. (LNCS.)
Abstract: In this report, we present the final results of the ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition. The RoadText challenge is based on the RoadText-1K dataset and aims to assess and enhance current methods for scene text detection, recognition, and tracking in videos. The RoadText-1K dataset contains 1000 dash cam videos with annotations for text bounding boxes and transcriptions in every frame. The competition features an end-to-end task, requiring systems to accurately detect, track, and recognize text in dash cam videos. The paper presents a comprehensive review of the submitted methods along with a detailed analysis of the results obtained by the methods. The analysis provides valuable insights into the current capabilities and limitations of video text detection, tracking, and recognition systems for dashcam videos.
|
|