|
Lasse Martensson, Ekta Vats, Anders Hast and Alicia Fornes. 2019. In Search of the Scribe: Letter Spotting as a Tool for Identifying Scribes in Large Handwritten Text Corpora.
Abstract: In this article, a form of the so-called word spotting-method is used on a large set of handwritten documents in order to identify those that contain script of similar execution. The point of departure for the investigation is the mediaeval Swedish manuscript Cod. Holm. D 3. The main scribe of this manuscript has yet not been identified in other documents. The current attempt aims at localising other documents that display a large degree of similarity in the characteristics of the script, these being possible candidates for being executed by the same hand. For this purpose, the method of word spotting has been employed, focusing on individual letters, and therefore the process is referred to as letter spotting in the article. In this process, a set of ‘g’:s, ‘h’:s and ‘k’:s have been selected as templates, and then a search has been made for close matches among the mediaeval Swedish charters. The search resulted in a number of charters that displayed great similarities with the manuscript D 3. The used letter spotting method thus proofed to be a very efficient sorting tool localising similar script samples.
Keywords: Scribal attribution/ writer identification; digital palaeography; word spotting; mediaeval charters; mediaeval manuscripts
|
|
|
Mohamed Ali Souibgui and 8 others. 2023. Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement. Proceedings of the 37th AAAI Conference on Artificial Intelligence.
Abstract: In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR
Keywords: Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning
|
|
|
Mickael Coustaty and Alicia Fornes. 2023. Document Analysis and Recognition – ICDAR 2023 Workshops. (LNCS.)
|
|
|
Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez and Dimosthenis Karatzas. 2023. Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia. Proceedings of the 37th AAAI Conference on Artificial Intelligence.1940–1948.
Abstract: Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.
|
|
|
Josep Llados, Gemma Sanchez and Enric Marti. 1998. A string based method to recognize symbols and structural textures in architectural plans. Graphics Recognition Algorithms and Systems Second International Workshop, GREC' 97 Nancy, France, August 22–23, 1997 Selected Papers. Springer Link, 91–103. (LNCS.)
Abstract: This paper deals with the recognition of symbols and structural textures in architectural plans using string matching techniques. A plan is represented by an attributed graph whose nodes represent characteristic points and whose edges represent segments. Symbols and textures can be seen as a set of regions, i.e. closed loops in the graph, with a particular arrangement. The search for a symbol involves a graph matching between the regions of a model graph and the regions of the graph representing the document. Discriminating a texture means a clustering of neighbouring regions of this graph. Both procedures involve a similarity measure between graph regions. A string codification is used to represent the sequence of outlining edges of a region. Thus, the similarity between two regions is defined in terms of the string edit distance between their boundary strings. The use of string matching allows the recognition method to work also under presence of distortion.
|
|
|
Josep Llados, Dimosthenis Karatzas, Joan Mas and Gemma Sanchez. 2008. A Generic Architecture for the Conversion of Document Collections into Semantically Annotated Digital Archives.
Keywords: Median Graph, Graph Embedding, Graph Matching, Structural Pattern Recognition
|
|
|
Sergio Escalera, Alicia Fornes, O. Pujol, Petia Radeva, Gemma Sanchez and Josep Llados. 2009. Blurred Shape Model for Binary and Grey-level Symbol Recognition. PRL, 30(15), 1424–1433.
Abstract: Many symbol recognition problems require the use of robust descriptors in order to obtain rich information of the data. However, the research of a good descriptor is still an open issue due to the high variability of symbols appearance. Rotation, partial occlusions, elastic deformations, intra-class and inter-class variations, or high variability among symbols due to different writing styles, are just a few problems. In this paper, we introduce a symbol shape description to deal with the changes in appearance that these types of symbols suffer. The shape of the symbol is aligned based on principal components to make the recognition invariant to rotation and reflection. Then, we present the Blurred Shape Model descriptor (BSM), where new features encode the probability of appearance of each pixel that outlines the symbols shape. Moreover, we include the new descriptor in a system to deal with multi-class symbol categorization problems. Adaboost is used to train the binary classifiers, learning the BSM features that better split symbol classes. Then, the binary problems are embedded in an Error-Correcting Output Codes framework (ECOC) to deal with the multi-class case. The methodology is evaluated on different synthetic and real data sets. State-of-the-art descriptors and classifiers are compared, showing the robustness and better performance of the present scheme to classify symbols with high variability of appearance.
|
|
|
Ernest Valveny and Enric Marti. 2003. A model for image generation and symbol recognition through the deformation of lineal shapes. PRL, 24(15), 2857–2867.
Abstract: We describe a general framework for the recognition of distorted images of lineal shapes, which relies on three items: a model to represent lineal shapes and their deformations, a model for the generation of distorted binary images and the combination of both models in a common probabilistic framework, where the generation of deformations is related to an internal energy, and the generation of binary images to an external energy. Then, recognition consists in the minimization of a global energy function, performed by using the EM algorithm. This general framework has been applied to the recognition of hand-drawn lineal symbols in graphic documents.
|
|
|
Jaume Gibert, Ernest Valveny and Horst Bunke. 2012. Feature Selection on Node Statistics Based Embedding of Graphs. PRL, 33(15), 1980–1990.
Abstract: Representing a graph with a feature vector is a common way of making statistical machine learning algorithms applicable to the domain of graphs. Such a transition from graphs to vectors is known as graphembedding. A key issue in graphembedding is to select a proper set of features in order to make the vectorial representation of graphs as strong and discriminative as possible. In this article, we propose features that are constructed out of frequencies of node label representatives. We first build a large set of features and then select the most discriminative ones according to different ranking criteria and feature transformation algorithms. On different classification tasks, we experimentally show that only a small significant subset of these features is needed to achieve the same classification rates as competing to state-of-the-art methods.
Keywords: Structural pattern recognition; Graph embedding; Feature ranking; PCA; Graph classification
|
|
|
Josep Llados, Horst Bunke and Enric Marti. 1997. Finding rotational symmetries by cyclic string matching. PRL, 18(14), 1435–1442.
Abstract: Symmetry is an important shape feature. In this paper, a simple and fast method to detect perfect and distorted rotational symmetries of 2D objects is described. The boundary of a shape is polygonally approximated and represented as a string. Rotational symmetries are found by cyclic string matching between two identical copies of the shape string. The set of minimum cost edit sequences that transform the shape string to a cyclically shifted version of itself define the rotational symmetry and its order. Finally, a modification of the algorithm is proposed to detect reflectional symmetries. Some experimental results are presented to show the reliability of the proposed algorithm
Keywords: Rotational symmetry; Reflectional symmetry; String matching
|
|