|
Pau Riba, Josep Llados, & Alicia Fornes. (2015). Handwritten Word Spotting by Inexact Matching of Grapheme Graphs. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 781–785).
Abstract: This paper presents a graph-based word spotting for handwritten documents. Contrary to most word spotting techniques, which use statistical representations, we propose a structural representation suitable to be robust to the inherent deformations of handwriting. Attributed graphs are constructed using a part-based approach. Graphemes extracted from shape convexities are used as stable units of handwriting, and are associated to graph nodes. Then, spatial relations between them determine graph edges. Spotting is defined in terms of an error-tolerant graph matching using bipartite-graph matching algorithm. To make the method usable in large datasets, a graph indexing approach that makes use of binary embeddings is used as preprocessing. Historical documents are used as experimental framework. The approach is comparable to statistical ones in terms of time and memory requirements, especially when dealing with large document collections.
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, Josep Llados, David Fernandez, & Cristina Cañero. (2015). Use case visual Bag-of-Words techniques for camera based identity document classification. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 721–725).
Abstract: Nowadays, automatic identity document recognition, including passport and driving license recognition, is at the core of many applications within the administrative and service sectors, such as police, hospitality, car renting, etc. In former years, the document information was manually extracted whereas today this data is recognized automatically from images obtained by flat-bed scanners. Yet, since these scanners tend to be expensive and voluminous, companies in the sector have recently turned their attention to cheaper, small and yet computationally powerful scanners: the mobile devices. The document identity recognition from mobile images enclose several new difficulties w.r.t traditional scanned images, such as the loss of a controlled background, perspective, blurring, etc. In this paper we present a real application for identity document classification of images taken from mobile devices. This classification process is of extreme importance since a prior knowledge of the document type and origin strongly facilitates the subsequent information extraction. The proposed method is based on a traditional Bagof-Words in which we have taken into consideration several key aspects to enhance recognition rate. The method performance has been studied on three datasets containing more than 2000 images from 129 different document classes.
|
|
|
Alicia Fornes, V.C.Kieu, M. Visani, N.Journet, & Anjan Dutta. (2014). The ICDAR/GREC 2013 Music Scores Competition: Staff Removal. In B.Lamiroy, & J.-M. Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 207–220). LNCS. Springer Berlin Heidelberg.
Abstract: The first competition on music scores that was organized at ICDAR and GREC in 2011 awoke the interest of researchers, who participated in both staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario concerning old and degraded music scores. For this purpose, we have generated a new set of semi-synthetic images using two degradation models that we previously introduced: local noise and 3D distortions. In this extended paper we provide an extended description of the dataset, degradation models, evaluation metrics, the participant’s methods and the obtained results that could not be presented at ICDAR and GREC proceedings due to page limitations.
Keywords: Competition; Graphics recognition; Music scores; Writer identification; Staff removal
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, & Josep Llados. (2015). Attributed Graph Grammar for floor plan analysis. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 726–730).
Abstract: In this paper, we propose the use of an Attributed Graph Grammar as unique framework to model and recognize the structure of floor plans. This grammar represents a building as a hierarchical composition of structurally and semantically related elements, where common representations are learned stochastically from annotated data. Given an input image, the parsing consists on constructing that graph representation that better agrees with the probabilistic model defined by the grammar. The proposed method provides several advantages with respect to the traditional floor plan analysis techniques. It uses an unsupervised statistical approach for detecting walls that adapts to different graphical notations and relaxes strong structural assumptions such are straightness and orthogonality. Moreover, the independence between the knowledge model and the parsing implementation allows the method to learn automatically different building configurations and thus, to cope the existing variability. These advantages are clearly demonstrated by comparing it with the most recent floor plan interpretation techniques on 4 datasets of real floor plans with different notations.
|
|
|
Josep Llados, & Marçal Rusiñol. (2014). Graphics Recognition Techniques. In D. Doermann, & K. Tombre (Eds.), Handbook of Document Image Processing and Recognition (Vol. D, pp. 489–521). Springer London.
Abstract: This chapter describes the most relevant approaches for the analysis of graphical documents. The graphics recognition pipeline can be splitted into three tasks. The low level or lexical task extracts the basic units composing the document. The syntactic level is focused on the structure, i.e., how graphical entities are constructed, and involves the location and classification of the symbols present in the document. The third level is a functional or semantic level, i.e., it models what the graphical symbols do and what they mean in the context where they appear. This chapter covers the lexical level, while the next two chapters are devoted to the syntactic and semantic level, respectively. The main problems reviewed in this chapter are raster-to-vector conversion (vectorization algorithms) and the separation of text and graphics components. The research and industrial communities have provided standard methods achieving reasonable performance levels. Hence, graphics recognition techniques can be considered to be in a mature state from a scientific point of view. Additionally this chapter provides insights on some related problems, namely, the extraction and recognition of dimensions in engineering drawings, and the recognition of hatched and tiled patterns. Both problems are usually associated, even integrated, in the vectorization process.
Keywords: Dimension recognition; Graphics recognition; Graphic-rich documents; Polygonal approximation; Raster-to-vector conversion; Texture-based primitive extraction; Text-graphics separation
|
|
|
Salvatore Tabbone, & Oriol Ramos Terrades. (2014). An Overview of Symbol Recognition. In D. Doermann, & K. Tombre (Eds.), Handbook of Document Image Processing and Recognition (Vol. D, pp. 523–551). Springer London.
Abstract: According to the Cambridge Dictionaries Online, a symbol is a sign, shape, or object that is used to represent something else. Symbol recognition is a subfield of general pattern recognition problems that focuses on identifying, detecting, and recognizing symbols in technical drawings, maps, or miscellaneous documents such as logos and musical scores. This chapter aims at providing the reader an overview of the different existing ways of describing and recognizing symbols and how the field has evolved to attain a certain degree of maturity.
Keywords: Pattern recognition; Shape descriptors; Structural descriptors; Symbolrecognition; Symbol spotting
|
|
|
Palaiahnakote Shivakumara, Anjan Dutta, Chew Lim Tan, & Umapada Pal. (2014). Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. MTAP - Multimedia Tools and Applications, 72(1), 515–539.
Abstract: In this paper, we address two complex issues: 1) Text frame classification and 2) Multi-oriented text detection in video text frame. We first divide a video frame into 16 blocks and propose a combination of wavelet and median-moments with k-means clustering at the block level to identify probable text blocks. For each probable text block, the method applies the same combination of feature with k-means clustering over a sliding window running through the blocks to identify potential text candidates. We introduce a new idea of symmetry on text candidates in each block based on the observation that pixel distribution in text exhibits a symmetric pattern. The method integrates all blocks containing text candidates in the frame and then all text candidates are mapped on to a Sobel edge map of the original frame to obtain text representatives. To tackle the multi-orientation problem, we present a new method called Angle Projection Boundary Growing (APBG) which is an iterative algorithm and works based on a nearest neighbor concept. APBG is then applied on the text representatives to fix the bounding box for multi-oriented text lines in the video frame. Directional information is used to eliminate false positives. Experimental results on a variety of datasets such as non-horizontal, horizontal, publicly available data (Hua’s data) and ICDAR-03 competition data (camera images) show that the proposed method outperforms existing methods proposed for video and the state of the art methods for scene text as well.
|
|
|
Anjan Dutta, Josep Llados, Horst Bunke, & Umapada Pal. (2014). A Product Graph Based Method for Dual Subgraph Matching Applied to Symbol Spotting. In Bart Lamiroy, & Jean-Marc Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 7–11). LNCS. Springer Berlin Heidelberg.
Abstract: Product graph has been shown as a way for matching subgraphs. This paper reports the extension of the product graph methodology for subgraph matching applied to symbol spotting in graphical documents. Here we focus on the two major limitations of the previous version of the algorithm: (1) spurious nodes and edges in the graph representation and (2) inefficient node and edge attributes. To deal with noisy information of vectorized graphical documents, we consider a dual edge graph representation on the original graph representing the graphical information and the product graph is computed between the dual edge graphs of the pattern graph and the target graph. The dual edge graph with redundant edges is helpful for efficient and tolerating encoding of the structural information of the graphical documents. The adjacency matrix of the product graph locates the pair of similar edges of two operand graphs and exponentiating the adjacency matrix finds similar random walks of greater lengths. Nodes joining similar random walks between two graphs are found by combining different weighted exponentials of adjacency matrices. An experimental investigation reveals that the recall obtained by this approach is quite encouraging.
Keywords: Product graph; Dual edge graph; Subgraph matching; Random walks; Graph kernel
|
|
|
Marçal Rusiñol, Lluis Pere de las Heras, & Oriol Ramos Terrades. (2014). Flowchart Recognition for Non-Textual Information Retrieval in Patent Search. IR - Information Retrieval, 17(5-6), 545–562.
Abstract: Relatively little research has been done on the topic of patent image retrieval and in general in most of the approaches the retrieval is performed in terms of a similarity measure between the query image and the images in the corpus. However, systems aimed at overcoming the semantic gap between the visual description of patent images and their conveyed concepts would be very helpful for patent professionals. In this paper we present a flowchart recognition method aimed at achieving a structured representation of flowchart images that can be further queried semantically. The proposed method was submitted to the CLEF-IP 2012 flowchart recognition task. We report the obtained results on this dataset.
Keywords: Flowchart recognition; Patent documents; Text/graphics separation; Raster-to-vector conversion; Symbol recognition
|
|
|
T.Chauhan, E.Perales, Kaida Xiao, E.Hird, Dimosthenis Karatzas, & Sophie Wuerger. (2014). The achromatic locus: Effect of navigation direction in color space. VSS - Journal of Vision, 14 (1)(25), 1–11.
Abstract: 5Y Impact Factor: 2.99 / 1st (Ophthalmology)
An achromatic stimulus is defined as a patch of light that is devoid of any hue. This is usually achieved by asking observers to adjust the stimulus such that it looks neither red nor green and at the same time neither yellow nor blue. Despite the theoretical and practical importance of the achromatic locus, little is known about the variability in these settings. The main purpose of the current study was to evaluate whether achromatic settings were dependent on the task of the observers, namely the navigation direction in color space. Observers could either adjust the test patch along the two chromatic axes in the CIE u*v* diagram or, alternatively, navigate along the unique-hue lines. Our main result is that the navigation method affects the reliability of these achromatic settings. Observers are able to make more reliable achromatic settings when adjusting the test patch along the directions defined by the four unique hues as opposed to navigating along the main axes in the commonly used CIE u*v* chromaticity plane. This result holds across different ambient viewing conditions (Dark, Daylight, Cool White Fluorescent) and different test luminance levels (5, 20, and 50 cd/m2). The reduced variability in the achromatic settings is consistent with the idea that internal color representations are more aligned with the unique-hue lines than the u* and v* axes.
Keywords: achromatic; unique hues; color constancy; luminance; color space
|
|
|
A.Kesidis, & Dimosthenis Karatzas. (2014). Logo and Trademark Recognition. In D. Doermann, & K. Tombre (Eds.), Handbook of Document Image Processing and Recognition (Vol. D, pp. 591–646). Springer London.
Abstract: The importance of logos and trademarks in nowadays society is indisputable, variably seen under a positive light as a valuable service for consumers or a negative one as a catalyst of ever-increasing consumerism. This chapter discusses the technical approaches for enabling machines to work with logos, looking into the latest methodologies for logo detection, localization, representation, recognition, retrieval, and spotting in a variety of media. This analysis is presented in the context of three different applications covering the complete depth and breadth of state of the art techniques. These are trademark retrieval systems, logo recognition in document images, and logo detection and removal in images and videos. This chapter, due to the very nature of logos and trademarks, brings together various facets of document image analysis spanning graphical and textual content, while it links document image analysis to other computer vision domains, especially when it comes to the analysis of real-scene videos and images.
Keywords: Logo recognition; Logo removal; Logo spotting; Trademark registration; Trademark retrieval systems
|
|
|
Anjan Dutta. (2014). Inexact Subgraph Matching Applied to Symbol Spotting in Graphical Documents (Josep Llados, & Umapada Pal, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: There is a resurgence in the use of structural approaches in the usual object recognition and retrieval problem. Graph theory, in particular, graph matching plays a relevant role in that. Specifically, the detection of an object (or a part of that) in an image in terms of structural features can be formulated as a subgraph matching. Subgraph matching is a challenging task. Specially due to the presence of outliers most of the graph matching algorithms do not perform well in subgraph matching scenario. Also exact subgraph isomorphism has proven to be an NP-complete problem. So naturally, in graph matching community, there are lot of efforts addressing the problem of subgraph matching within suboptimal bound. Most of them work with approximate algorithms that try to get an inexact solution in estimated way. In addition, usual recognition must cope with distortion. Inexact graph matching consists in finding the best isomorphism under a similarity measure. Theoretically this thesis proposes algorithms for solving subgraph matching in an approximate and inexact way.
We consider the symbol spotting problem on graphical documents or line drawings from application point of view. This is a well known problem in the graphics recognition community. It can be further applied for indexing and classification of documents based on their contents. The structural nature of this kind of documents easily motivates one for giving a graph based representation. So the symbol spotting problem on graphical documents can be considered as a subgraph matching problem. The main challenges in this application domain is the noise and distortions that might come during the usage, digitalization and raster to vector conversion of those documents. Apart from that computer vision nowadays is not any more confined within a limited number of images. So dealing a huge number of images with graph based method is a further challenge.
In this thesis, on one hand, we have worked on efficient and robust graph representation to cope with the noise and distortions coming from documents. On the other hand, we have worked on different graph based methods and framework to solve the subgraph matching problem in a better approximated way, which can also deal with considerable number of images. Firstly, we propose a symbol spotting method by hashing serialized subgraphs. Graph serialization allows to create factorized substructures such as graph paths, which can be organized in hash tables depending on the structural similarities of the serialized subgraphs. The involvement of hashing techniques helps to reduce the search space substantially and speeds up the spotting procedure. Secondly, we introduce contextual similarities based on the walk based propagation on tensor product graph. These contextual similarities involve higher order information and more reliable than pairwise similarities. We use these higher order similarities to formulate subgraph matching as a node and edge selection problem in the tensor product graph. Thirdly, we propose near convex grouping to form near convex region adjacency graph which eliminates the limitations of traditional region adjacency graph representation for graphic recognition. Fourthly, we propose a hierarchical graph representation by simplifying/correcting the structural errors to create a hierarchical graph of the base graph. Later these hierarchical graph structures are matched with some graph matching methods. Apart from that, in this thesis we have provided an overall experimental comparison of all the methods and some of the state-of-the-art methods. Furthermore, some dataset models have also been proposed.
|
|
|
Clement Guerin, Christophe Rigaud, Karell Bertet, Jean-Christophe Burie, Arnaud Revel, & Jean-Marc Ogier. (2014). Réduction de l’espace de recherche pour les personnages de bandes dessinées. In 19th National Congress Reconnaissance de Formes et l'Intelligence Artificielle.
Abstract: Les bandes dessinées représentent un patrimoine culturel important dans de nombreux pays et leur numérisation massive offre la possibilité d'effectuer des recherches dans le contenu des images. À ce jour, ce sont principalement les structures des pages et leurs contenus textuels qui ont été étudiés, peu de travaux portent sur le contenu graphique. Nous proposons de nous appuyer sur des éléments déjà étudiés tels que la position des cases et des bulles, pour réduire l'espace de recherche et localiser les personnages en fonction de la queue des bulles. L'évaluation de nos différentes contributions à partir de la base eBDtheque montre un taux de détection des queues de bulle de 81.2%, de localisation des personnages allant jusqu'à 85% et un gain d'espace de recherche de plus de 50%.
Keywords: contextual search; document analysis; comics characters
|
|
|
Christophe Rigaud, & Clement Guerin. (2014). Localisation contextuelle des personnages de bandes dessinées. In Colloque International Francophone sur l'Écrit et le Document.
Abstract: Les auteurs proposent une méthode de localisation des personnages dans des cases de bandes dessinées en s'appuyant sur les caractéristiques des bulles de dialogue. L'évaluation montre un taux de localisation des personnages allant jusqu'à 65%.
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2014). Scene Text Recognition: No Country for Old Men? In 1st International Workshop on Robust Reading.
|
|