Records |
Links |
Author |
Partha Pratim Roy; Umapada Pal; Josep Llados |

Title |
Touching Text Character Localization in Graphical Documents using SIFT |
Type |
Book Chapter |
Year |
2010 |
Publication |
Graphics Recognition. Achievements, Challenges, and Evolution. 8th International Workshop, GREC 2009. Selected Papers |
Abbreviated Journal |
Volume |
6020 |
Issue |
Pages |
199-211 |
Keywords |
Support Vector Machine; Text Component; Graphical Line; Document Image; Scale Invariant Feature Transform |
Abstract |
Interpretation of graphical document images is a challenging task as it requires proper understanding of text/graphics symbols present in such documents. Difficulties arise in graphical document recognition when text and symbol overlapped/touched. Intersection of text and symbols with graphical lines and curves occur frequently in graphical documents and hence separation of such symbols is very difficult.
Several pattern recognition and classification techniques exist to recognize isolated text/symbol. But, the touching/overlapping text and symbol recognition has not yet been dealt successfully. An interesting technique, Scale Invariant Feature Transform (SIFT), originally devised for object recognition can take care of overlapping problems. Even if SIFT features have emerged as a very powerful object descriptors, their employment in graphical documents context has not been investigated much. In this paper we present the adaptation of the SIFT approach in the context of text character localization (spotting) in graphical documents. We evaluate the applicability of this technique in such documents and discuss the scope of improvement by combining some state-of-the-art approaches. |
Address |
Corporate Author |
Thesis |
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
0302-9743 |
978-3-642-13727-3 |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ RPL2010c |
Serial |
2408 |
Permanent link to this record |
Author |
Nuria Cirera |

Title |
Recognition of Handwritten Historical Documents |
Type |
Report |
Year |
2012 |
Publication |
CVC Technical Report |
Abbreviated Journal |
Volume |
174 |
Issue |
Pages |
Keywords |
Abstract |
Address |
Corporate Author |
Thesis |
Master's thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ Cir2012 |
Serial |
2416 |
Permanent link to this record |
Author |
Miquel Ferrer; I. Bardaji; Ernest Valveny; Dimosthenis Karatzas; Horst Bunke |

Title |
Median Graph Computation by Means of Graph Embedding into Vector Spaces |
Type |
Book Chapter |
Year |
2013 |
Publication |
Graph Embedding for Pattern Analysis |
Abbreviated Journal |
Volume |
Issue |
Pages |
45-72 |
Keywords |
Abstract |
In pattern recognition [8, 14], a key issue to be addressed when designing a system is how to represent input patterns. Feature vectors is a common option. That is, a set of numerical features describing relevant properties of the pattern are computed and arranged in a vector form. The main advantages of this kind of representation are computational simplicity and a well sound mathematical foundation. Thus, a large number of operations are available to work with vectors and a large repository of algorithms for pattern analysis and classification exist. However, the simple structure of feature vectors might not be the best option for complex patterns where nonnumerical features or relations between different parts of the pattern become relevant. |
Address |
Corporate Author |
Thesis |
Publisher |
Springer New York |
Place of Publication |
Editor |
Yun Fu; Yungian Ma |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-1-4614-4456-5 |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ FBV2013 |
Serial |
2421 |
Permanent link to this record |
Author |
Lluis Pere de las Heras; Ernest Valveny; Gemma Sanchez |

Title |
Unsupervised and Notation-Independent Wall Segmentation in Floor Plans Using a Combination of Statistical and Structural Strategies |
Type |
Conference Article |
Year |
2013 |
Publication |
10th IAPR International Workshop on Graphics Recognition |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
Address |
Bethlehem; PA; USA; August 2013 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ HVS2013b |
Serial |
2696 |
Permanent link to this record |
Author |
Francisco Cruz |

Title |
Probabilistic Graphical Models for Document Analysis |
Type |
Book Whole |
Year |
2016 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
Latest advances in digitization techniques have fostered the interest in creating digital copies of collections of documents. Digitized documents permit an easy maintenance, loss-less storage, and efficient ways for transmission and to perform information retrieval processes. This situation has opened a new market niche to develop systems able to automatically extract and analyze information contained in these collections, specially in the ambit of the business activity.
Due to the great variety of types of documents this is not a trivial task. For instance, the automatic extraction of numerical data from invoices differs substantially from a task of text recognition in historical documents. However, in order to extract the information of interest, is always necessary to identify the area of the document where it is located. In the area of Document Analysis we refer to this process as layout analysis, which aims at identifying and categorizing the different entities that compose the document, such as text regions, pictures, text lines, or tables, among others. To perform this task it is usually necessary to incorporate a prior knowledge about the task into the analysis process, which can be modeled by defining a set of contextual relations between the different entities of the document. The use of context has proven to be useful to reinforce the recognition process and improve the results on many computer vision tasks. It presents two fundamental questions: What kind of contextual information is appropriate for a given task, and how to incorporate this information into the models.
In this thesis we study several ways to incorporate contextual information to the task of document layout analysis, and to the particular case of handwritten text line segmentation. We focus on the study of Probabilistic Graphical Models and other mechanisms for this purpose, and propose several solutions to these problems. First, we present a method for layout analysis based on Conditional Random Fields. With this model we encode local contextual relations between variables, such as pair-wise constraints. Besides, we encode a set of structural relations between different classes of regions at feature level. Second, we present a method based on 2D-Probabilistic Context-free Grammars to encode structural and hierarchical relations. We perform a comparative study between Probabilistic Graphical Models and this syntactic approach. Third, we propose a method for structured documents based on Bayesian Networks to represent the document structure, and an algorithm based in the Expectation-Maximization to find the best configuration of the page. We perform a thorough evaluation of the proposed methods on two particular collections of documents: a historical collection composed of ancient structured documents, and a collection of contemporary documents. In addition, we present a general method for the task of handwritten text line segmentation. We define a probabilistic framework where we combine the EM algorithm with variational approaches for computing inference and parameter learning on a Markov Random Field. We evaluate our method on several collections of documents, including a general dataset of annotated administrative documents. Results demonstrate the applicability of our method to real problems, and the contribution of the use of contextual information to this kind of problems. |
Address |
Corporate Author |
Thesis |
Ph.D. thesis |
Publisher |
Ediciones Graficas Rey |
Place of Publication |
Editor |
Oriol Ramos Terrades |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-84-945373-2-5 |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ Cru2016 |
Serial |
2861 |
Permanent link to this record |
Author |
Pau Riba; Alicia Fornes; Josep Llados |

Title |
Towards the Alignment of Handwritten Music Scores |
Type |
Conference Article |
Year |
2015 |
Publication |
11th IAPR International Workshop on Graphics Recognition |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
It is very common to find different versions of the same music work in archives of Opera Theaters. These differences correspond to modifications and annotations from the musicians. From the musicologist point of view, these variations are very interesting and deserve study. This paper explores the alignment of music scores as a tool for automatically detecting the passages that contain such differences. Given the difficulties in the recognition of handwritten music scores, our goal is to align the music scores and at the same time, avoid the recognition of music elements as much as possible. After removing the staff lines, braces and ties, the bar lines are detected. Then, the bar units are described as a whole using the Blurred Shape Model. The bar units alignment is performed by using Dynamic Time Warping. The analysis of the alignment path is used to detect the variations in the music scores. The method has been evaluated on a subset of the CVC-MUSCIMA dataset, showing encouraging results. |
Address |
Nancy; France; August 2015 |
Corporate Author |
Thesis |
Publisher |
Springer International Publishing |
Place of Publication |
Editor |
Bart Lamiroy; Rafael Dueire Lins |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-3-319-52158-9 |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ |
Serial |
2874 |
Permanent link to this record |
Author |
Lluis Gomez |

Title |
Exploiting Similarity Hierarchies for Multi-script Scene Text Understanding |
Type |
Book Whole |
Year |
2016 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
This thesis addresses the problem of automatic scene text understanding in unconstrained conditions. In particular, we tackle the tasks of multi-language and arbitrary-oriented text detection, tracking, and script identification in natural scenes.
For this we have developed a set of generic methods that build on top of the basic observation that text has always certain key visual and structural characteristics that are independent of the language or script in which it is written. Text instances in any
language or script are always formed as groups of similar atomic parts, being them either individual characters, small stroke parts, or even whole words in the case of cursive text. This holistic (sumof-parts) and recursive perspective has lead us to explore different variants of the “segmentation and grouping” paradigm of computer vision.
Scene text detection methodologies are usually based in classification of individual regions or patches, using a priory knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organization through which
text emerges as a perceptually significant group of atomic objects.
In this thesis, we argue that the text detection problem must be posed as the detection of meaningful groups of regions. We address the problem of text detection in natural scenes from a hierarchical perspective, making explicit use of the recursive nature of text, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypothese with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Within this generic framework, we design a text-specific object proposals algorithm that, contrary to existing generic object proposals methods, aims directly to the detection of text regions groupings. For this, we abandon the rigid definition of “what is text” of traditional specialized text detectors, and move towards more fuzzy perspective of grouping-based object proposals methods.
Then, we present a hybrid algorithm for detection and tracking of scene text where the notion of region groupings plays also a central role. By leveraging the structural arrangement of text group components between consecutive frames we can improve
the overall tracking performance of the system.
Finally, since our generic detection framework is inherently designed for multi-language environments, we focus on the problem of script identification in order to build a multi-language end-toend reading system. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key
characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed size as in the typical use of holistic CNN classifiers, we propose a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. |
Address |
Corporate Author |
Thesis |
Ph.D. thesis |
Publisher |
Place of Publication |
Editor |
Dimosthenis Karatzas |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ Gom2016 |
Serial |
2891 |
Permanent link to this record |
Author |
Thanh Ha Do; Salvatore Tabbone; Oriol Ramos Terrades |

Title |
Spotting Symbol over Graphical Documents Via Sparsity in Visual Vocabulary |
Type |
Book Chapter |
Year |
2016 |
Publication |
Recent Trends in Image Processing and Pattern Recognition |
Abbreviated Journal |
Volume |
709 |
Issue |
Pages |
Keywords |
Abstract |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ HTR2016 |
Serial |
2956 |
Permanent link to this record |
Author |
Josep Llados; Daniel Lopresti; Seiichi Uchida (eds) |

Title |
16th International Conference, 2021, Proceedings, Part III |
Type |
Book Whole |
Year |
2021 |
Publication |
Document Analysis and Recognition – ICDAR 2021 |
Abbreviated Journal |
Volume |
12823 |
Issue |
Pages |
Keywords |
Abstract |
This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding. |
Address |
Lausanne, Switzerland, September 5-10, 2021 |
Corporate Author |
Thesis |
Publisher |
Springer Cham |
Place of Publication |
Editor |
Josep Llados; Daniel Lopresti; Seiichi Uchida |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-3-030-86333-3 |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ |
Serial |
3727 |
Permanent link to this record |
Author |
Josep Llados; Daniel Lopresti; Seiichi Uchida (eds) |

Title |
16th International Conference, 2021, Proceedings, Part IV |
Type |
Book Whole |
Year |
2021 |
Publication |
Document Analysis and Recognition – ICDAR 2021 |
Abbreviated Journal |
Volume |
12824 |
Issue |
Pages |
Keywords |
Abstract |
This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding. |
Address |
Lausanne, Switzerland, September 5-10, 2021 |
Corporate Author |
Thesis |
Publisher |
Springer Cham |
Place of Publication |
Editor |
Josep Llados; Daniel Lopresti; Seiichi Uchida |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-3-030-86336-4 |
Medium |
Area |
Expedition |
Conference |
Notes  |
Approved |
no |
Call Number |
Admin @ si @ |
Serial |
3728 |
Permanent link to this record |