|
David Aldavert and Marçal Rusiñol. 2018. Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting. 13th IAPR International Workshop on Document Analysis Systems.223–228.
Abstract: Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable for
large document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, in
this paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semantic
information in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation.
Keywords: Word Spotting; Bag of Visual Words; Synthetic Codebook; Semantic Information
|
|
|
Gemma Sanchez and Josep Llados. 2003. Syntactic models to represent perceptually regular repetitive patterns in graphic documents.
|
|
|
Gemma Sanchez and Josep Llados. 2004. Syntactic models to represent perceptually regular repetitive patterns in graphic documents.
|
|
|
Joan Mas. 2005. Syntactic approaches to recognize bi-dimensional shapes in graphics recognition. Application to sketching interfaces.
|
|
|
Alicia Fornes, Josep Llados, Gemma Sanchez and Horst Bunke. 2009. Symbol-independent writer identification in old handwritten music scores. In proceedings of 8th IAPR International Workshop on Graphics Recognition. Springer Berlin Heidelberg, 186–197.
|
|
|
Marçal Rusiñol, Josep Llados and Gemma Sanchez. 2010. Symbol Spotting in Vectorized Technical Drawings Through a Lookup Table of Region Strings. PAA, 13(3), 321–331.
Abstract: In this paper, we address the problem of symbol spotting in technical document images applied to scanned and vectorized line drawings. Like any information spotting architecture, our approach has two components. First, symbols are decomposed in primitives which are compactly represented and second a primitive indexing structure aims to efficiently retrieve similar primitives. Primitives are encoded in terms of attributed strings representing closed regions. Similar strings are clustered in a lookup table so that the set median strings act as indexing keys. A voting scheme formulates hypothesis in certain locations of the line drawing image where there is a high presence of regions similar to the queried ones, and therefore, a high probability to find the queried graphical symbol. The proposed approach is illustrated in a framework consisting in spotting furniture symbols in architectural drawings. It has been proved to work even in the presence of noise and distortion introduced by the scanning and raster-to-vector processes.
|
|
|
Marçal Rusiñol and Josep Llados. 2005. Symbol Spotting in Technical Drawings Using Vectorial Signatures.
|
|
|
Marçal Rusiñol and Josep Llados. 2006. Symbol Spotting in Technical Drawings Using Vectorial Signatures. Graphics Recognition: Ten Years Review and Future Perspectives, W. Liu, J. Llados (Eds.), LNCS 3926: 35–46.
|
|
|
Anjan Dutta, Josep Llados and Umapada Pal. 2011. Symbol Spotting in Line Drawings Through Graph Paths Hashing. 11th International Conference on Document Analysis and Recognition.982–986.
Abstract: In this paper we propose a symbol spotting technique through hashing the shape descriptors of graph paths (Hamiltonian paths). Complex graphical structures in line drawings can be efficiently represented by graphs, which ease the accurate localization of the model symbol. Graph paths are the factorized substructures of graphs which enable robust recognition even in the presence of noise and distortion. In our framework, the entire database of the graphical documents is indexed in hash tables by the locality sensitive hashing (LSH) of shape descriptors of the paths. The hashing data structure aims to execute an approximate k-NN search in a sub-linear time. The spotting method is formulated by a spatial voting scheme to the list of locations of the paths that are decided during the hash table lookup process. We perform detailed experiments with various dataset of line drawings and the results demonstrate the effectiveness and efficiency of the technique.
|
|
|
Anjan Dutta. 2010. Symbol Spotting in Graphical Documents by Serialized Subgraph Matching. (Master's thesis, .)
|
|