|
Arnau Baro, Pau Riba and Alicia Fornes. 2018. A Starting Point for Handwritten Music Recognition. 1st International Workshop on Reading Music Systems.5–6.
Abstract: In the last years, the interest in Optical Music Recognition (OMR) has reawakened, especially since the appearance of deep learning. However, there are very few works addressing handwritten scores. In this work we describe a full OMR pipeline for handwritten music scores by using Convolutional and Recurrent Neural Networks that could serve as a baseline for the research community.
Keywords: Optical Music Recognition; Long Short-Term Memory; Convolutional Neural Networks; MUSCIMA++; CVCMUSCIMA
|
|
|
Anjan Dutta and Hichem Sahbi. 2018. Stochastic Graphlet Embedding. TNNLS, 1–14.
Abstract: Graph-based methods are known to be successful in many machine learning and pattern classification tasks. These methods consider semi-structured data as graphs where nodes correspond to primitives (parts, interest points, segments,
etc.) and edges characterize the relationships between these primitives. However, these non-vectorial graph data cannot be straightforwardly plugged into off-the-shelf machine learning algorithms without a preliminary step of – explicit/implicit –graph vectorization and embedding. This embedding process
should be resilient to intra-class graph variations while being highly discriminant. In this paper, we propose a novel high-order stochastic graphlet embedding (SGE) that maps graphs into vector spaces. Our main contribution includes a new stochastic search procedure that efficiently parses a given graph and extracts/samples unlimitedly high-order graphlets. We consider
these graphlets, with increasing orders, to model local primitives as well as their increasingly complex interactions. In order to build our graph representation, we measure the distribution of these graphlets into a given graph, using particular hash functions that efficiently assign sampled graphlets into isomorphic sets with a very low probability of collision. When
combined with maximum margin classifiers, these graphlet-based representations have positive impact on the performance of pattern comparison and recognition as corroborated through extensive experiments using standard benchmark databases.
Keywords: Stochastic graphlets; Graph embedding; Graph classification; Graph hashing; Betweenness centrality
|
|
|
Arnau Baro, Pau Riba, Jorge Calvo-Zaragoza and Alicia Fornes. 2018. Optical Music Recognition by Long Short-Term Memory Networks. In A. Fornes, B.L., ed. Graphics Recognition. Current Trends and Evolutions. Springer, 81–95. (LNCS.)
Abstract: Optical Music Recognition refers to the task of transcribing the image of a music score into a machine-readable format. Many music scores are written in a single staff, and therefore, they could be treated as a sequence. Therefore, this work explores the use of Long Short-Term Memory (LSTM) Recurrent Neural Networks for reading the music score sequentially, where the LSTM helps in keeping the context. For training, we have used a synthetic dataset of more than 40000 images, labeled at primitive level. The experimental results are promising, showing the benefits of our approach.
Keywords: Optical Music Recognition; Recurrent Neural Network; Long ShortTerm Memory
|
|
|
Marçal Rusiñol and Lluis Gomez. 2018. Avances en clasificación de imágenes en los últimos diez años. Perspectivas y limitaciones en el ámbito de archivos fotográficos históricos.
|
|
|
Francisco Cruz and Oriol Ramos Terrades. 2018. A probabilistic framework for handwritten text line segmentation.
Abstract: We successfully combine Expectation-Maximization algorithm and variational
approaches for parameter learning and computing inference on Markov random fields. This is a general method that can be applied to many computer
vision tasks. In this paper, we apply it to handwritten text line segmentation.
We conduct several experiments that demonstrate that our method deal with
common issues of this task, such as complex document layout or non-latin
scripts. The obtained results prove that our method achieve state-of-theart performance on different benchmark datasets without any particular fine
tuning step.
Keywords: Document Analysis; Text Line Segmentation; EM algorithm; Probabilistic Graphical Models; Parameter Learning
|
|
|
Pau Riba, Josep Llados and Alicia Fornes. 2017. Error-tolerant coarse-to-fine matching model for hierarchical graphs. In Pasquale Foggia, Cheng-Lin Liu and Mario Vento, eds. 11th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition. Springer International Publishing, 107–117.
Abstract: Graph-based representations are effective tools to capture structural information from visual elements. However, retrieving a query graph from a large database of graphs implies a high computational complexity. Moreover, these representations are very sensitive to noise or small changes. In this work, a novel hierarchical graph representation is designed. Using graph clustering techniques adapted from graph-based social media analysis, we propose to generate a hierarchy able to deal with different levels of abstraction while keeping information about the topology. For the proposed representations, a coarse-to-fine matching method is defined. These approaches are validated using real scenarios such as classification of colour images and handwritten word spotting.
Keywords: Graph matching; Hierarchical graph; Graph-based representation; Coarse-to-fine matching
|
|
|
Veronica Romero, Alicia Fornes, Enrique Vidal and Joan Andreu Sanchez. 2017. Information Extraction in Handwritten Marriage Licenses Books Using the MGGI Methodology. In L.A. Alexandre, J.Salvador Sanchez and Joao M. F. Rodriguez, eds. 8th Iberian Conference on Pattern Recognition and Image Analysis.287–294. (LNCS.)
Abstract: Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demographic and genealogical research. For example, marriage license books have been used for centuries by ecclesiastical and secular institutions to register marriages. These books follow a simple structure of the text in the records with a evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. In previous works we studied the use of category-based language models and how a Grammatical Inference technique known as MGGI could improve the accuracy of these tasks. In this work we analyze the main causes of the semantic errors observed in previous results and apply a better implementation of the MGGI technique to solve these problems. Using the resulting language model, transcription and information extraction experiments have been carried out, and the results support our proposed approach.
Keywords: Handwritten Text Recognition; Information extraction; Language modeling; MGGI; Categories-based language model
|
|
|
Pau Riba, Josep Llados, Alicia Fornes and Anjan Dutta. 2017. Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases. PRL, 87, 203–211.
Abstract: Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations. However, retrieving a query graph from a large dataset of graphs implies a high computational complexity. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. With this aim, in this paper we propose a graph indexation formalism applied to visual retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Then, each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in different real scenarios such as handwritten word spotting in images of historical documents or symbol spotting in architectural floor plans.
|
|
|
Lluis Gomez and Dimosthenis Karatzas. 2017. TextProposals: a Text‐specific Selective Search Algorithm for Word Spotting in the Wild. PR, 70, 60–74.
Abstract: Motivated by the success of powerful while expensive techniques to recognize words in a holistic way (Goel et al., 2013; Almazán et al., 2014; Jaderberg et al., 2016) object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way.
Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers (Almazán et al., 2014; Jaderberg et al., 2016) shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10% F-score the best-performing method in the last ICDAR Robust Reading Competition (Karatzas, 2015). Source code of the complete end-to-end system is available at https://github.com/lluisgomez/TextProposals.
|
|
|
Lluis Gomez, Anguelos Nicolaou and Dimosthenis Karatzas. 2017. Improving patch‐based scene text script identification with ensembles of conjoined networks. PR, 67, 85–96.
|
|