|
Anjan Dutta, Josep Llados, Horst Bunke and Umapada Pal. 2013. A Product graph based method for dual subgraph matching applied to symbol spotting. 10th IAPR International Workshop on Graphics Recognition.
Abstract: Product graph has been shown to be an efficient way for matching subgraphs. This paper reports the extension of the product graph methodology for subgraph matching applied to symbol spotting in graphical documents. This paper focuses on the two major limitations of the previous version of product graph: (1) Spurious nodes and edges in the graph representation and (2) Inefficient node and edge attributes. To deal with noisy information of vectorized graphical documents, we consider a dual graph representation on the original graph representing the graphical information and the product graph is computed between the dual graphs of the query graphs and the input graph.
The dual graph with redundant edges is helpful for efficient and tolerating encoding of the structural information of the graphical documents. The adjacency matrix of the product graph locates similar path information of two graphs and exponentiating the adjacency matrix finds similar paths of greater lengths. Nodes joining similar paths between two graphs are found by combining different exponentials of adjacency matrices. An experimental investigation reveals that the recall obtained by this approach is quite encouraging.
|
|
|
Anjan Dutta, Josep Llados, Horst Bunke and Umapada Pal. 2014. A Product Graph Based Method for Dual Subgraph Matching Applied to Symbol Spotting. In Bart Lamiroy and Jean-Marc Ogier, eds. Graphics Recognition. Current Trends and Challenges. Springer Berlin Heidelberg, 7–11. (LNCS.)
Abstract: Product graph has been shown as a way for matching subgraphs. This paper reports the extension of the product graph methodology for subgraph matching applied to symbol spotting in graphical documents. Here we focus on the two major limitations of the previous version of the algorithm: (1) spurious nodes and edges in the graph representation and (2) inefficient node and edge attributes. To deal with noisy information of vectorized graphical documents, we consider a dual edge graph representation on the original graph representing the graphical information and the product graph is computed between the dual edge graphs of the pattern graph and the target graph. The dual edge graph with redundant edges is helpful for efficient and tolerating encoding of the structural information of the graphical documents. The adjacency matrix of the product graph locates the pair of similar edges of two operand graphs and exponentiating the adjacency matrix finds similar random walks of greater lengths. Nodes joining similar random walks between two graphs are found by combining different weighted exponentials of adjacency matrices. An experimental investigation reveals that the recall obtained by this approach is quite encouraging.
Keywords: Product graph; Dual edge graph; Subgraph matching; Random walks; Graph kernel
|
|
|
Salvatore Tabbone and Josep Llados. 2007. A Propos de la Reconnaissance de Documents Graphiques: Synthese et Perspectives. Traitement et Analyse de l’Information: Methodes et Applications.247–258.
|
|
|
M. Visani, Oriol Ramos Terrades and Salvatore Tabbone. 2011. A Protocol to Characterize the Descriptive Power and the Complementarity of Shape Descriptors. IJDAR, 14(1), 87–100.
Abstract: Most document analysis applications rely on the extraction of shape descriptors, which may be grouped into different categories, each category having its own advantages and drawbacks (O.R. Terrades et al. in Proceedings of ICDAR’07, pp. 227–231, 2007). In order to improve the richness of their description, many authors choose to combine multiple descriptors. Yet, most of the authors who propose a new descriptor content themselves with comparing its performance to the performance of a set of single state-of-the-art descriptors in a specific applicative context (e.g. symbol recognition, symbol spotting...). This results in a proliferation of the shape descriptors proposed in the literature. In this article, we propose an innovative protocol, the originality of which is to be as independent of the final application as possible and which relies on new quantitative and qualitative measures. We introduce two types of measures: while the measures of the first type are intended to characterize the descriptive power (in terms of uniqueness, distinctiveness and robustness towards noise) of a descriptor, the second type of measures characterizes the complementarity between multiple descriptors. Characterizing upstream the complementarity of shape descriptors is an alternative to the usual approach where the descriptors to be combined are selected by trial and error, considering the performance characteristics of the overall system. To illustrate the contribution of this protocol, we performed experimental studies using a set of descriptors and a set of symbols which are widely used by the community namely ART and SC descriptors and the GREC 2003 database.
Keywords: Document analysis; Shape descriptors; Symbol description; Performance characterization; Complementarity analysis
|
|
|
Miquel Ferrer, Dimosthenis Karatzas, Ernest Valveny and Horst Bunke. 2009. A Recursive Embedding Approach to Median Graph Computation. 7th IAPR – TC–15 Workshop on Graph–Based Representations in Pattern Recognition. Springer Berlin Heidelberg, 113–123. (LNCS.)
Abstract: The median graph has been shown to be a good choice to infer a representative of a set of graphs. It has been successfully applied to graph-based classification and clustering. Nevertheless, its computation is extremely complex. Several approaches have been presented up to now based on different strategies. In this paper we present a new approximate recursive algorithm for median graph computation based on graph embedding into vector spaces. Preliminary experiments on three databases show that this new approach is able to obtain better medians than the previous existing approaches.
|
|
|
Marçal Rusiñol and Josep Llados. 2008. A Region-Based Hashing Approach for Symbol Spotting in Technical Documents. In W. Lius, J.L., J.M. Ogier, ed. Graphics Recognition: Recent Advances and New Opportunities.104–113. (LNCS.)
|
|
|
Marçal Rusiñol and Josep Llados. 2007. A Region-Based Hashing Approach for Symbol Spotting in Thechnical Documents. In J. Llados, W.L., J.M. Ogier, ed. Seventh IAPR International Workshop on Graphics Recognition.41–42.
|
|
|
Oriol Ramos Terrades, Salvatore Tabbone and Ernest Valveny. 2007. A Review of Shape Descriptors for Document Analysis. 9th International Conference on Document Analysis and Recognition.227–231.
|
|
|
Albert Gordo and Ernest Valveny. 2009. A rotation invariant page layout descriptor for document classification and retrieval. 10th International Conference on Document Analysis and Recognition.481–485.
Abstract: Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
|
|
|
Anders Hast and Alicia Fornes. 2016. A Segmentation-free Handwritten Word Spotting Approach by Relaxed Feature Matching. 12th IAPR Workshop on Document Analysis Systems.150–155.
Abstract: The automatic recognition of historical handwritten documents is still considered challenging task. For this reason, word spotting emerges as a good alternative for making the information contained in these documents available to the user. Word spotting is defined as the task of retrieving all instances of the query word in a document collection, becoming a useful tool for information retrieval. In this paper we propose a segmentation-free word spotting approach able to deal with large document collections. Our method is inspired on feature matching algorithms that have been applied to image matching and retrieval. Since handwritten words have different shape, there is no exact transformation to be obtained. However, the sufficient degree of relaxation is achieved by using a Fourier based descriptor and an alternative approach to RANSAC called PUMA. The proposed approach is evaluated on historical marriage records, achieving promising results.
|
|