Publicacions CVC -- Query Results

[11–20] << 21 22 23 24 25 26 27 28 29 30 >> [31–40]

Details

	Records
	Author	David Aldavert
	Title	Efficient and Scalable Handwritten Word Spotting on Historical Documents using Bag of Visual Words			Type	Book Whole
	Year	2021	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Word spotting can be defined as the pattern recognition tasked aimed at locating and retrieving a specific keyword within a document image collection without explicitly transcribing the whole corpus. Its use is particularly interesting when applied in scenarios where Optical Character Recognition performs poorly or can not be used at all. This thesis focuses on such a scenario, word spotting on historical handwritten documents that have been written by a single author or by multiple authors with a similar calligraphy. This problem requires a visual signature that is robust to image artifacts, flexible to accommodate script variations and efficient to retrieve information in a rapid manner. For this, we have developed a set of word spotting methods that on their foundation use the well known Bag-of-Visual-Words (BoVW) representation. This representation has gained popularity among the document image analysis community to characterize handwritten words in an unsupervised manner. However, most approaches on this field rely on a basic BoVW configuration and disregard complex encoding and spatial representations. We determine which BoVW configurations provide the best performance boost to a spotting system. Then, we extend the segmentation-based word spotting, where word candidates are given a priori, to segmentation-free spotting. The proposed approach seeds the document images with overlapping word location candidates and characterizes them with a BoVW signature. Retrieval is achieved comparing the query and candidate signatures and returning the locations that provide a higher consensus. This is a simple but powerful approach that requires a more compact signature than in a segmentation-based scenario. We first project the BoVW signature into a reduced semantic topics space and then compress it further using Product Quantizers. The resulting signature only requires a few dozen bytes, allowing us to index thousands of pages on a common desktop computer. The final system still yields a performance comparable to the state-of-the-art despite all the information loss during the compression phases. Afterwards, we also study how to combine different modalities of information in order to create a query-by-X spotting system where, words are indexed using an information modality and queries are retrieved using another. We consider three different information modalities: visual, textual and audio. Our proposal is to create a latent feature space where features which are semantically related are projected onto the same topics. Creating thus a new feature space where information from different modalities can be compared. Later, we consider the codebook generation and descriptor encoding problem. The codebooks used to encode the BoVW signatures are usually created using an unsupervised clustering algorithm and, they require to test multiple parameters to determine which configuration is best for a certain document collection. We propose a semantic clustering algorithm which allows to estimate the best parameter from data. Since gather annotated data is costly, we use synthetically generated word images. The resulting codebook is database agnostic, i. e. a codebook that yields a good performance on document collections that use the same script. We also propose the use of an additional codebook to approximate descriptors and reduce the descriptor encoding complexity to sub-linear. Finally, we focus on the problem of signatures dimensionality. We propose a new symbol probability signature where each bin represents the probability that a certain symbol is present a certain location of the word image. This signature is extremely compact and combined with compression techniques can represent word images with just a few bytes per signature.
	Address	April 2021
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Marçal Rusiñol;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-122714-5-4	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Ald2021			Serial	3601
Permanent link to this record



	Author	Raul Gomez; Baoguang Shi; Lluis Gomez; Lukas Numann; Andreas Veit; Jiri Matas; Serge Belongie; Dimosthenis Karatzas
	Title	ICDAR2017 Robust Reading Challenge on COCO-Text			Type	Conference Article
	Year	2017	Publication	14th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address	Kyoto; Japan; November 2017
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ GSG2017			Serial	3076
Permanent link to this record



	Author	Suman Ghosh; Ernest Valveny
	Title	R-PHOC: Segmentation-Free Word Spotting using CNN			Type	Conference Article
	Year	2017	Publication	14th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages
	Keywords	Convolutional neural network; Image segmentation; Artificial neural network; Nearest neighbor search
	Abstract	arXiv:1707.01294 This paper proposes a region based convolutional neural network for segmentation-free word spotting. Our network takes as input an image and a set of word candidate bound- ing boxes and embeds all bounding boxes into an embedding space, where word spotting can be casted as a simple nearest neighbour search between the query representation and each of the candidate bounding boxes. We make use of PHOC embedding as it has previously achieved significant success in segmentation- based word spotting. Word candidates are generated using a simple procedure based on grouping connected components using some spatial constraints. Experiments show that R-PHOC which operates on images directly can improve the current state-of- the-art in the standard GW dataset and performs as good as PHOCNET in some cases designed for segmentation based word spotting.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ GhV2017a			Serial	3079
Permanent link to this record



	Author	Suman Ghosh; Ernest Valveny
	Title	Visual attention models for scene text recognition			Type	Conference Article
	Year	2017	Publication	14th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	arXiv:1706.01487 In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR'03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ GhV2017b			Serial	3080
Permanent link to this record



	Author	Lluis Pere de las Heras; Oriol Ramos Terrades; Josep Llados
	Title	Ontology-Based Understanding of Architectural Drawings			Type	Book Chapter
	Year	2017	Publication	International Workshop on Graphics Recognition. GREC 2015.Graphic Recognition. Current Trends and Challenges	Abbreviated Journal
	Volume	9657	Issue		Pages	75-85
	Keywords	Graphics recognition; Floor plan analysi; Domain ontology
	Abstract	In this paper we present a knowledge base of architectural documents aiming at improving existing methods of floor plan classification and understanding. It consists of an ontological definition of the domain and the inclusion of real instances coming from both, automatically interpreted and manually labeled documents. The knowledge base has proven to be an effective tool to structure our knowledge and to easily maintain and upgrade it. Moreover, it is an appropriate means to automatically check the consistency of relational data and a convenient complement of hard-coded knowledge interpretation systems.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ HRL2017			Serial	3086
Permanent link to this record



	Author	ChunYang; Xu Cheng Yin; Hong Yu; Dimosthenis Karatzas; Yu Cao
	Title	ICDAR2017 Robust Reading Challenge on Text Extraction from Biomedical Literature Figures (DeTEXT)			Type	Conference Article
	Year	2017	Publication	14th International Conference on Document Analysis and Recognition	Abbreviated Journal
	Volume		Issue		Pages	1444-1447
	Keywords
	Abstract	Hundreds of millions of figures are available in the biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information and understanding biomedical documents. Unlike images in the open domain, biomedical figures present a variety of unique challenges. For example, biomedical figures typically have complex layouts, small font sizes, short text, specific text, complex symbols and irregular text arrangements. This paper presents the final results of the ICDAR 2017 Competition on Text Extraction from Biomedical Literature Figures (ICDAR2017 DeTEXT Competition), which aims at extracting (detecting and recognizing) text from biomedical literature figures. Similar to text extraction from scene images and web pictures, ICDAR2017 DeTEXT Competition includes three major tasks, i.e., text detection, cropped word recognition and end-to-end text recognition. Here, we describe in detail the data set, tasks, evaluation protocols and participants of this competition, and report the performance of the participating methods.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-1-5386-3586-5	Medium
	Area		Expedition		Conference	ICDAR
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ YCY2017			Serial	3098
Permanent link to this record



	Author	Alicia Fornes; Bart Lamiroy
	Title	Graphics Recognition, Current Trends and Evolutions			Type	Book Whole
	Year	2018	Publication	Graphics Recognition, Current Trends and Evolutions	Abbreviated Journal
	Volume	11009	Issue		Pages
	Keywords
	Abstract	This book constitutes the thoroughly refereed post-conference proceedings of the 12th International Workshop on Graphics Recognition, GREC 2017, held in Kyoto, Japan, in November 2017. The 10 revised full papers presented were carefully reviewed and selected from 14 initial submissions. They contain both classical and emerging topics of graphics rcognition, namely analysis and detection of diagrams, search and classification, optical music recognition, interpretation of engineering drawings and maps.
	Address
	Corporate Author				Thesis
	Publisher	Springer International Publishing	Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title	LNCS
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-3-030-02283-9	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ FoL2018			Serial	3171
Permanent link to this record



	Author	Suman Ghosh
	Title	Word Spotting and Recognition in Images from Heterogeneous Sources A			Type	Book Whole
	Year	2018	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations.
	Address	November 2018
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Ernest Valveny
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-948531-0-4	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ Gho2018			Serial	3217
Permanent link to this record



	Author	Sounak Dey; Palaiahnakote Shivakumara; K.S. Raghunanda; Umapada Pal; Tong Lu; G. Hemantha Kumar; Chee Seng Chan
	Title	Script independent approach for multi-oriented text detection in scene image			Type	Journal Article
	Year	2017	Publication	Neurocomputing	Abbreviated Journal	NEUCOM
	Volume	242	Issue		Pages	96-112
	Keywords
	Abstract	Developing a text detection method which is invariant to scripts in natural scene images is a challeng- ing task due to different geometrical structures of various scripts. Besides, multi-oriented of text lines in natural scene images make the problem more challenging. This paper proposes to explore ring radius transform (RRT) for text detection in multi-oriented and multi-script environments. The method finds component regions based on convex hull to generate radius matrices using RRT. It is a fact that RRT pro- vides low radius values for the pixels that are near to edges, constant radius values for the pixels that represent stroke width, and high radius values that represent holes created in background and convex hull because of the regular structures of text components. We apply k -means clustering on the radius matrices to group such spatially coherent regions into individual clusters. Then the proposed method studies the radius values of such cluster components that are close to the centroid and far from the cen- troid to detect text components. Furthermore, we have developed a Bangla dataset (named as ISI-UM dataset) and propose a semi-automatic system for generating its ground truth for text detection of arbi- trary orientations, which can be used by the researchers for text detection and recognition in the future. The ground truth will be released to public. Experimental results on our ISI-UM data and other standard datasets, namely, ICDAR 2013 scene, SVT and MSRA data, show that the proposed method outperforms the existing methods in terms of multi-lingual and multi-oriented text detection ability.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN		Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ DSR2017			Serial	3260
Permanent link to this record



	Author	Joana Maria Pujadas-Mora; Alicia Fornes; Josep Llados; Gabriel Brea-Martinez; Miquel Valls-Figols
	Title	The Baix Llobregat (BALL) Demographic Database, between Historical Demography and Computer Vision (nineteenth–twentieth centuries			Type	Book Chapter
	Year	2019	Publication	Nominative Data in Demographic Research in the East and the West: monograph	Abbreviated Journal
	Volume		Issue		Pages	29-61
	Keywords
	Abstract	The Baix Llobregat (BALL) Demographic Database is an ongoing database project containing individual census data from the Catalan region of Baix Llobregat (Spain) during the nineteenth and twentieth centuries. The BALL Database is built within the project ‘NETWORKS: Technology and citizen innovation for building historical social networks to understand the demographic past’ directed by Alícia Fornés from the Center for Computer Vision and Joana Maria Pujadas-Mora from the Center for Demographic Studies, both at the Universitat Autònoma de Barcelona, funded by the Recercaixa program (2017–2019). Its webpage is http://dag.cvc.uab.es/xarxes/.The aim of the project is to develop technologies facilitating massive digitalization of demographic sources, and more specifically the padrones (local censuses), in order to reconstruct historical ‘social’ networks employing computer vision technology. Such virtual networks can be created thanks to the linkage of nominative records compiled in the local censuses across time and space. Thus, digitized versions of individual and family lifespans are established, and individuals and families can be located spatially.
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-5-7996-2656-3	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ PFL2019			Serial	3351
Permanent link to this record

Select All Deselect All

[11–20] << 21 22 23 24 25 26 27 28 29 30 >> [31–40]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: