|
Marçal Rusiñol, Volkmar Frinken, Dimosthenis Karatzas, Andrew Bagdanov, & Josep Llados. (2014). Multimodal page classification in administrative document image streams. IJDAR - International Journal on Document Analysis and Recognition, 17(4), 331–341.
Abstract: In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an n-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.
Keywords: Digital mail room; Multimodal page classification; Visual and textual document description
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, Marçal Rusiñol, & Aura Hernandez-Sabate. (2019). Feature Extraction by Using Dual-Generalized Discriminative Common Vectors. JMIV - Journal of Mathematical Imaging and Vision, 61(3), 331–351.
Abstract: In this paper, a dual online subspace-based learning method called dual-generalized discriminative common vectors (Dual-GDCV) is presented. The method extends incremental GDCV by exploiting simultaneously both the concepts of incremental and decremental learning for supervised feature extraction and classification. Our methodology is able to update the feature representation space without recalculating the full projection or accessing the previously processed training data. It allows both adding information and removing unnecessary data from a knowledge base in an efficient way, while retaining the previously acquired knowledge. The proposed method has been theoretically proved and empirically validated in six standard face recognition and classification datasets, under two scenarios: (1) removing and adding samples of existent classes, and (2) removing and adding new classes to a classification problem. Results show a considerable computational gain without compromising the accuracy of the model in comparison with both batch methodologies and other state-of-art adaptive methods.
Keywords: Online feature extraction; Generalized discriminative common vectors; Dual learning; Incremental learning; Decremental learning
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2016). A fast hierarchical method for multi‐script and arbitrary oriented scene text extraction. IJDAR - International Journal on Document Analysis and Recognition, 19(4), 335–349.
Abstract: Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing text detection methods. This paper addresses the problem of text
segmentation in natural scenes from a hierarchical perspective.
Contrary to existing methods, we make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypotheses with
high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Results obtained over four standard datasets, covering text in variable orientations and different languages, demonstrate that our algorithm, while being trained in a single mixed dataset, outperforms state of the art
methods in unconstrained scenarios.
Keywords: scene text; segmentation; detection; hierarchical grouping; perceptual organisation
|
|
|
Sounak Dey, Anguelos Nicolaou, Josep Llados, & Umapada Pal. (2019). Evaluation of the Effect of Improper Segmentation on Word Spotting. IJDAR - International Journal on Document Analysis and Recognition, 22, 361–374.
Abstract: Word spotting is an important recognition task in large-scale retrieval of document collections. In most of the cases, methods are developed and evaluated assuming perfect word segmentation. In this paper, we propose an experimental framework to quantify the goodness that word segmentation has on the performance achieved by word spotting methods in identical unbiased conditions. The framework consists of generating systematic distortions on segmentation and retrieving the original queries from the distorted dataset. We have tested our framework on several established and state-of-the-art methods using George Washington and Barcelona Marriage Datasets. The experiments done allow for an estimate of the end-to-end performance of word spotting methods.
|
|
|
Josep Llados, & Gemma Sanchez. (2004). Graph Matching vs. Graph Parsing in Graphics Recognition: A Combined Approach. IJPRAI - International Journal of Pattern Recognition and Artificial Intelligence, 455–473.
|
|