|
Pau Riba, Andreas Fischer, Josep Llados and Alicia Fornes. 2020. Learning Graph Edit Distance by Graph NeuralNetworks.
Abstract: The emergence of geometric deep learning as a novel framework to deal with graph-based representations has faded away traditional approaches in favor of completely new methodologies. In this paper, we propose a new framework able to combine the advances on deep metric learning with traditional approximations of the graph edit distance. Hence, we propose an efficient graph distance based on the novel field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure, and thus, leveraging this information for its use on a distance computation. The performance of the proposed graph distance is validated on two different scenarios. On the one hand, in a graph retrieval of handwritten words~\ie~keyword spotting, showing its superior performance when compared with (approximate) graph edit distance benchmarks. On the other hand, demonstrating competitive results for graph similarity learning when compared with the current state-of-the-art on a recent benchmark dataset.
|
|
|
Pau Riba, Andreas Fischer, Josep Llados and Alicia Fornes. 2021. Learning graph edit distance by graph neural networks. PR, 120, 108132.
Abstract: The emergence of geometric deep learning as a novel framework to deal with graph-based representations has faded away traditional approaches in favor of completely new methodologies. In this paper, we propose a new framework able to combine the advances on deep metric learning with traditional approximations of the graph edit distance. Hence, we propose an efficient graph distance based on the novel field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure, and thus, leveraging this information for its use on a distance computation. The performance of the proposed graph distance is validated on two different scenarios. On the one hand, in a graph retrieval of handwritten words i.e. keyword spotting, showing its superior performance when compared with (approximate) graph edit distance benchmarks. On the other hand, demonstrating competitive results for graph similarity learning when compared with the current state-of-the-art on a recent benchmark dataset.
|
|
|
Pau Riba, Alicia Fornes and Josep Llados. 2015. Towards the Alignment of Handwritten Music Scores. In Bart Lamiroy and Rafael Dueire Lins, eds. 11th IAPR International Workshop on Graphics Recognition. Springer International Publishing. (LNCS.)
Abstract: It is very common to find different versions of the same music work in archives of Opera Theaters. These differences correspond to modifications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study. This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such differences. Given the difficulties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the staff lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.
|
|
|
Pau Riba, Alicia Fornes and Josep Llados. 2017. Towards the Alignment of Handwritten Music Scores. In Bart Lamiroy and R Dueire Lins, eds. International Workshop on Graphics Recognition. GREC 2015.Graphic Recognition. Current Trends and Challenges.103–116. (LNCS.)
Abstract: It is very common to nd dierent versions of the same music work in archives of Opera Theaters. These dierences correspond to modications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study.
This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such dierences. Given the diculties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the sta lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results.
Keywords: Optical Music Recognition; Handwritten Music Scores; Dynamic Time Warping alignment
|
|
|
Pau Riba, Adria Molina, Lluis Gomez, Oriol Ramos Terrades and Josep Llados. 2021. Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting. 16th International Conference on Document Analysis and Recognition.381–395.
Abstract: In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both, handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work.
|
|
|
Pau Riba. 2020. Distilling Structure from Imagery: Graph-based Models for the Interpretation of Document Images. (Ph.D. thesis, Ediciones Graficas Rey.)
Abstract: From its early stages, the community of Pattern Recognition and Computer Vision has considered the importance of leveraging the structural information when understanding images. Usually, graphs have been proposed as a suitable model to represent this kind of information due to their flexibility and representational power able to codify both, the components, objects, or entities and their pairwise relationship. Even though graphs have been successfully applied to a huge variety of tasks, as a result of their symbolic and relational nature, graphs have always suffered from some limitations compared to statistical approaches. Indeed, some trivial mathematical operations do not have an equivalence in the graph domain. For instance, in the core of many pattern recognition applications, there is a need to compare two objects. This operation, which is trivial when considering feature vectors defined in \(\mathbb{R}^n\), is not properly defined for graphs.
In this thesis, we have investigated the importance of the structural information from two perspectives, the traditional graph-based methods and the new advances on Geometric Deep Learning. On the one hand, we explore the problem of defining a graph representation and how to deal with it on a large scale and noisy scenario. On the other hand, Graph Neural Networks are proposed to first redefine a Graph Edit Distance methodologies as a metric learning problem, and second, to apply them in a real use case scenario for the detection of repetitive patterns which define tables in invoice documents. As experimental framework, we have validated the different methodological contributions in the domain of Document Image Analysis and Recognition.
|
|
|
Partha Pratim Roy, Umapada Pal, Josep Llados and Mathieu Nicolas Delalandre. 2009. Multi-Oriented and Multi-Sized Touching Character Segmentation using Dynamic Programming. 10th International Conference on Document Analysis and Recognition.11–15.
Abstract: In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region at the background portion. Using Convex Hull information, we use these background information to find some initial points to segment a touching string into possible primitive segments (a primitive segment consists of a single character or a part of a character). Next these primitive segments are merged to get optimum segmentation and dynamic programming is applied using total likelihood of characters as the objective function. SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Circular ring and convex hull ring based approach has been used along with angular information of the contour pixels of the character to make the feature rotation invariant. From the experiment, we obtained encouraging results.
|
|
|
Partha Pratim Roy, Umapada Pal, Josep Llados and Mathieu Nicolas Delalandre. 2012. Multi-oriented touching text character segmentation in graphical documents using dynamic programming. PR, 45(5), 1972–1983.
Abstract: 2,292 JCR
The touching character segmentation problem becomes complex when touching strings are multi-oriented. Moreover in graphical documents sometimes characters in a single-touching string have different orientations. Segmentation of such complex touching is more challenging. In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region in the background portion. Based on the convex hull information, at first, we use this background information to find some initial points for segmentation of a touching string into possible primitives (a primitive consists of a single character or part of a character). Next, the primitives are merged to get optimum segmentation. A dynamic programming algorithm is applied for this purpose using the total likelihood of characters as the objective function. A SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Experiments were performed in different databases of real and synthetic touching characters and the results show that the method is efficient in segmenting touching characters of arbitrary orientations and sizes.
|
|
|
Partha Pratim Roy, Umapada Pal, Josep Llados and F. Kimura. 2008. Convex Hull based Approach for Multi-oriented Character Recognition form Graphical Documents. 19th International Conference on Pattern Recognition.
|
|
|
Partha Pratim Roy, Umapada Pal and Josep Llados. 2008. Multi-oriented English Text Line Extraction using Background and Foreground Information. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,.315–322.
|
|