|
Pau Riba, Anjan Dutta, Josep Llados, & Alicia Fornes. (2017). Graph-based deep learning for graphics classification. In 12th IAPR International Workshop on Graphics Recognition (pp. 29–30).
Abstract: Graph-based representations are a common way to deal with graphics recognition problems. However, previous works were mainly focused on developing learning-free techniques. The success of deep learning frameworks have proved that learning is a powerful tool to solve many problems, however it is not straightforward to extend these methodologies to non euclidean data such as graphs. On the other hand, graphs are a good representational structure for graphical entities. In this work, we present some deep learning techniques that have been proposed in the literature for graph-based representations and
we show how they can be used in graphics recognition problems
|
|
|
Arnau Baro, Pau Riba, & Alicia Fornes. (2018). A Starting Point for Handwritten Music Recognition. In 1st International Workshop on Reading Music Systems (pp. 5–6).
Abstract: In the last years, the interest in Optical Music Recognition (OMR) has reawakened, especially since the appearance of deep learning. However, there are very few works addressing handwritten scores. In this work we describe a full OMR pipeline for handwritten music scores by using Convolutional and Recurrent Neural Networks that could serve as a baseline for the research community.
Keywords: Optical Music Recognition; Long Short-Term Memory; Convolutional Neural Networks; MUSCIMA++; CVCMUSCIMA
|
|
|
Arnau Baro, Pau Riba, Jorge Calvo-Zaragoza, & Alicia Fornes. (2018). Optical Music Recognition by Long Short-Term Memory Networks. In B. L. A. Fornes (Ed.), Graphics Recognition. Current Trends and Evolutions (Vol. 11009, pp. 81–95). LNCS. Springer.
Abstract: Optical Music Recognition refers to the task of transcribing the image of a music score into a machine-readable format. Many music scores are written in a single staff, and therefore, they could be treated as a sequence. Therefore, this work explores the use of Long Short-Term Memory (LSTM) Recurrent Neural Networks for reading the music score sequentially, where the LSTM helps in keeping the context. For training, we have used a synthetic dataset of more than 40000 images, labeled at primitive level. The experimental results are promising, showing the benefits of our approach.
Keywords: Optical Music Recognition; Recurrent Neural Network; Long ShortTerm Memory
|
|
|
Arnau Baro, Pau Riba, Jorge Calvo-Zaragoza, & Alicia Fornes. (2019). From Optical Music Recognition to Handwritten Music Recognition: a Baseline. PRL - Pattern Recognition Letters, 123, 1–8.
Abstract: Optical Music Recognition (OMR) is the branch of document image analysis that aims to convert images of musical scores into a computer-readable format. Despite decades of research, the recognition of handwritten music scores, concretely the Western notation, is still an open problem, and the few existing works only focus on a specific stage of OMR. In this work, we propose a full Handwritten Music Recognition (HMR) system based on Convolutional Recurrent Neural Networks, data augmentation and transfer learning, that can serve as a baseline for the research community.
|
|
|
Pau Riba, Josep Llados, & Alicia Fornes. (2020). Hierarchical graphs for coarse-to-fine error tolerant matching. PRL - Pattern Recognition Letters, 134, 116–124.
Abstract: During the last years, graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their ability to capture both structural and appearance-based information. Thus, they provide a greater representational power than classical statistical frameworks. However, graph-based representations leads to high computational complexities usually dealt by graph embeddings or approximated matching techniques. Despite their representational power, they are very sensitive to noise and small variations of the input image. With the aim to cope with the time complexity and the variability present in the generated graphs, in this paper we propose to construct a novel hierarchical graph representation. Graph clustering techniques adapted from social media analysis have been used in order to contract a graph at different abstraction levels while keeping information about the topology. Abstract nodes attributes summarise information about the contracted graph partition. For the proposed representations, a coarse-to-fine matching technique is defined. Hence, small graphs are used as a filtering before more accurate matching methods are applied. This approach has been validated in real scenarios such as classification of colour images or retrieval of handwritten words (i.e. word spotting).
Keywords: Hierarchical graph representation; Coarse-to-fine graph matching; Graph-based retrieval
|
|
|
Juan Ignacio Toledo, Manuel Carbonell, Alicia Fornes, & Josep Llados. (2019). Information Extraction from Historical Handwritten Document Images with a Context-aware Neural Model. PR - Pattern Recognition, 86, 27–36.
Abstract: Many historical manuscripts that hold trustworthy memories of the past societies contain information organized in a structured layout (e.g. census, birth or marriage records). The precious information stored in these documents cannot be effectively used nor accessed without costly annotation efforts. The transcription driven by the semantic categories of words is crucial for the subsequent access. In this paper we describe an approach to extract information from structured historical handwritten text images and build a knowledge representation for the extraction of meaning out of historical data. The method extracts information, such as named entities, without the need of an intermediate transcription step, thanks to the incorporation of context information through language models. Our system has two variants, the first one is based on bigrams, whereas the second one is based on recurrent neural networks. Concretely, our second architecture integrates a Convolutional Neural Network to model visual information from word images together with a Bidirecitonal Long Short Term Memory network to model the relation among the words. This integrated sequential approach is able to extract more information than just the semantic category (e.g. a semantic category can be associated to a person in a record). Our system is generic, it deals with out-of-vocabulary words by design, and it can be applied to structured handwritten texts from different domains. The method has been validated with the ICDAR IEHHR competition protocol, outperforming the existing approaches.
Keywords: Document image analysis; Handwritten documents; Named entity recognition; Deep neural networks
|
|
|
Juan Ignacio Toledo, Sebastian Sudholt, Alicia Fornes, Jordi Cucurull, A. Fink, & Josep Llados. (2016). Handwritten Word Image Categorization with Convolutional Neural Networks and Spatial Pyramid Pooling. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (Vol. 10029, pp. 543–552). LNCS. Springer International Publishing.
Abstract: The extraction of relevant information from historical document collections is one of the key steps in order to make these documents available for access and searches. The usual approach combines transcription and grammars in order to extract semantically meaningful entities. In this paper, we describe a new method to obtain word categories directly from non-preprocessed handwritten word images. The method can be used to directly extract information, being an alternative to the transcription. Thus it can be used as a first step in any kind of syntactical analysis. The approach is based on Convolutional Neural Networks with a Spatial Pyramid Pooling layer to deal with the different shapes of the input images. We performed the experiments on a historical marriage record dataset, obtaining promising results.
Keywords: Document image analysis; Word image categorization; Convolutional neural networks; Named entity detection
|
|
|
Veronica Romero, Alicia Fornes, Enrique Vidal, & Joan Andreu Sanchez. (2016). Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books. In 15th international conference on Frontiers in Handwriting Recognition.
Abstract: Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies and
genealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previous
works we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach.
|
|
|
Pau Riba, Alicia Fornes, & Josep Llados. (2017). Towards the Alignment of Handwritten Music Scores. In Bart Lamiroy, & R Dueire Lins (Eds.), International Workshop on Graphics Recognition. GREC 2015.Graphic Recognition. Current Trends and Challenges (Vol. 9657, pp. 103–116). LNCS.
Abstract: It is very common to nd dierent versions of the same music work in archives of Opera Theaters. These dierences correspond to modications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study.
This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such dierences. Given the diculties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the sta lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.
Keywords: Optical Music Recognition; Handwritten Music Scores; Dynamic Time Warping alignment
|
|
|
Sounak Dey, Anguelos Nicolaou, Josep Llados, & Umapada Pal. (2016). Local Binary Pattern for Word Spotting in Handwritten Historical Document. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 574–583). LNCS.
Abstract: Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spotting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents a simple innovative learning-free method for word spotting from large scale historical documents combining Local Binary Pattern (LBP) and spatial sampling. This method offers three advantages: firstly, it operates in completely learning free paradigm which is very different from unsupervised learning methods, secondly, the computational time is significantly low because of the LBP features, which are very fast to compute, and thirdly, the method can be used in scenarios where annotations are not available. Finally, we compare the results of our proposed retrieval method with other methods in the literature and we obtain the best results in the learning free paradigm.
Keywords: Local binary patterns; Spatial sampling; Learning-free; Word spotting; Handwritten; Historical document analysis; Large-scale data
|
|
|
Pau Riba, Josep Llados, Alicia Fornes, & Anjan Dutta. (2017). Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases. PRL - Pattern Recognition Letters, 87, 203–211.
Abstract: Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations. However, retrieving a query graph from a large dataset of graphs implies a high computational complexity. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. With this aim, in this paper we propose a graph indexation formalism applied to visual retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Then, each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in different real scenarios such as handwritten word spotting in images of historical documents or symbol spotting in architectural floor plans.
|
|
|
Lei Kang, Juan Ignacio Toledo, Pau Riba, Mauricio Villegas, Alicia Fornes, & Marçal Rusiñol. (2018). Convolve, Attend and Spell: An Attention-based Sequence-to-Sequence Model for Handwritten Word Recognition. In 40th German Conference on Pattern Recognition (pp. 459–472).
Abstract: This paper proposes Convolve, Attend and Spell, an attention based sequence-to-sequence model for handwritten word recognition. The proposed architecture has three main parts: an encoder, consisting of a CNN and a bi-directional GRU, an attention mechanism devoted to focus on the pertinent features and a decoder formed by a one-directional GRU, able to spell the corresponding word, character by character. Compared with the recent state-of-the-art, our model achieves competitive results on the IAM dataset without needing any pre-processing step, predefined lexicon nor language model. Code and additional results are available in https://github.com/omni-us/research-seq2seq-HTR.
|
|
|
Jialuo Chen, Pau Riba, Alicia Fornes, Juan Mas, Josep Llados, & Joana Maria Pujadas-Mora. (2018). Word-Hunter: A Gamesourcing Experience to Validate the Transcription of Historical Manuscripts. In 16th International Conference on Frontiers in Handwriting Recognition (pp. 528–533).
Abstract: Nowadays, there are still many handwritten historical documents in archives waiting to be transcribed and indexed. Since manual transcription is tedious and time consuming, the automatic transcription seems the path to follow. However, the performance of current handwriting recognition techniques is not perfect, so a manual validation is mandatory. Crowdsourcing is a good strategy for manual validation, however it is a tedious task. In this paper we analyze experiences based in gamification
in order to propose and design a gamesourcing framework that increases the interest of users. Then, we describe and analyze our experience when validating the automatic transcription using the gamesourcing application. Moreover, thanks to the combination of clustering and handwriting recognition techniques, we can speed up the validation while maintaining the performance.
Keywords: Crowdsourcing; Gamification; Handwritten documents; Performance evaluation
|
|
|
Pau Riba, Andreas Fischer, Josep Llados, & Alicia Fornes. (2018). Learning Graph Distances with Message Passing Neural Networks. In 24th International Conference on Pattern Recognition (pp. 2239–2244).
Abstract: Graph representations have been widely used in pattern recognition thanks to their powerful representation formalism and rich theoretical background. A number of error-tolerant graph matching algorithms such as graph edit distance have been proposed for computing a distance between two labelled graphs. However, they typically suffer from a high
computational complexity, which makes it difficult to apply
these matching algorithms in a real scenario. In this paper, we propose an efficient graph distance based on the emerging field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure and learns a metric with a siamese network approach. The performance of the proposed graph distance is validated in two application cases, graph classification and graph retrieval of handwritten words, and shows a promising performance when compared with
(approximate) graph edit distance benchmarks.
Keywords: ★Best Paper Award★
|
|
|
Manuel Carbonell, Mauricio Villegas, Alicia Fornes, & Josep Llados. (2018). Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model. In 13th IAPR International Workshop on Document Analysis Systems (pp. 399–404).
Abstract: When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the
performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different
configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing.
Keywords: Named entity recognition; Handwritten Text Recognition; neural networks
|
|