Publicacions CVC -- Query Results

Details

Records
Author	Andreas Fischer; Volkmar Frinken; Horst Bunke; Ching Y. Suen
Title	Improving HMM-Based Keyword Spotting with Character Language Models			Type	Conference Article
Year	2013	Publication	12th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume		Issue		Pages	506-510
Keywords
Abstract	Facing high error rates and slow recognition speed for full text transcription of unconstrained handwriting images, keyword spotting is a promising alternative to locate specific search terms within scanned document images. We have previously proposed a learning-based method for keyword spotting using character hidden Markov models that showed a high performance when compared with traditional template image matching. In the lexicon-free approach pursued, only the text appearance was taken into account for recognition. In this paper, we integrate character n-gram language models into the spotting system in order to provide an additional language context. On the modern IAM database as well as the historical George Washington database, we demonstrate that character language models significantly improve the spotting performance.
Address	Washington; USA; August 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1520-5363	ISBN		Medium
Area		Expedition		Conference	ICDAR
Notes	DAG; 600.045; 605.203			Approved	no
Call Number	Admin @ si @ FFB2013			Serial	2295
Permanent link to this record



Author	Christophe Rigaud; Dimosthenis Karatzas; Joost Van de Weijer; Jean-Christophe Burie; Jean-Marc Ogier
Title	An active contour model for speech balloon detection in comics			Type	Conference Article
Year	2013	Publication	12th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume		Issue		Pages	1240-1244
Keywords
Abstract	Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent comic book understanding would enable a variety of new applications, including content-based retrieval and content retargeting. Document understanding in this domain is challenging as comics are semi-structured documents, combining semantically important graphical and textual parts. Few studies have been done in this direction. In this work we detail a novel approach for closed and non-closed speech balloon localization in scanned comic book pages, an essential step towards a fully automatic comic book understanding. The approach is compared with existing methods for closed balloon localization found in the literature and results are presented.
Address	washington; USA; August 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1520-5363	ISBN		Medium
Area		Expedition		Conference	ICDAR
Notes	DAG; CIC; 600.056			Approved	no
Call Number	Admin @ si @ RKW2013a			Serial	2260
Permanent link to this record



Author	Alicia Fornes; Xavier Otazu; Josep Llados
Title	Show through cancellation and image enhancement by multiresolution contrast processing			Type	Conference Article
Year	2013	Publication	12th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume		Issue		Pages	200-204
Keywords
Abstract	Historical documents suffer from different types of degradation and noise such as background variation, uneven illumination or dark spots. In case of double-sided documents, another common problem is that the back side of the document usually interferes with the front side because of the transparency of the document or ink bleeding. This effect is called the show through phenomenon. Many methods are developed to solve these problems, and in the case of show-through, by scanning and matching both the front and back sides of the document. In contrast, our approach is designed to use only one side of the scanned document. We hypothesize that show-trough are low contrast components, while foreground components are high contrast ones. A Multiresolution Contrast (MC) decomposition is presented in order to estimate the contrast of features at different spatial scales. We cancel the show-through phenomenon by thresholding these low contrast components. This decomposition is also able to enhance the image removing shadowed areas by weighting spatial scales. Results show that the enhanced images improve the readability of the documents, allowing scholars both to recover unreadable words and to solve ambiguities.
Address	Washington; USA; August 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1520-5363	ISBN		Medium
Area		Expedition		Conference	ICDAR
Notes	DAG; 602.006; 600.045; 600.061; 600.052;CIC			Approved	no
Call Number	Admin @ si @ FOL2013			Serial	2241
Permanent link to this record



Author	David Aldavert; Marçal Rusiñol; Ricardo Toledo; Josep Llados
Title	Integrating Visual and Textual Cues for Query-by-String Word Spotting			Type	Conference Article
Year	2013	Publication	12th International Conference on Document Analysis and Recognition	Abbreviated Journal
Volume		Issue		Pages	511 - 515
Keywords
Abstract	In this paper, we present a word spotting framework that follows the query-by-string paradigm where word images are represented both by textual and visual representations. The textual representation is formulated in terms of character $n$-grams while the visual one is based on the bag-of-visual-words scheme. These two representations are merged together and projected to a sub-vector space. This transform allows to, given a textual query, retrieve word instances that were only represented by the visual modality. Moreover, this statistical representation can be used together with state-of-the-art indexation structures in order to deal with large-scale scenarios. The proposed method is evaluated using a collection of historical documents outperforming state-of-the-art performances.
Address	Washington; USA; August 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1520-5363	ISBN		Medium
Area		Expedition		Conference	ICDAR
Notes	DAG; ADAS; 600.045; 600.055; 600.061			Approved	no
Call Number	Admin @ si @ ART2013			Serial	2224
Permanent link to this record



Author	Ferran Diego; Joan Serrat; Antonio Lopez
Title	Joint spatio-temporal alignment of sequences			Type	Journal Article
Year	2013	Publication	IEEE Transactions on Multimedia	Abbreviated Journal	TMM
Volume	15	Issue	6	Pages	1377-1387
Keywords	video alignment
Abstract	Video alignment is important in different areas of computer vision such as wide baseline matching, action recognition, change detection, video copy detection and frame dropping prevention. Current video alignment methods usually deal with a relatively simple case of fixed or rigidly attached cameras or simultaneous acquisition. Therefore, in this paper we propose a joint video alignment for bringing two video sequences into a spatio-temporal alignment. Specifically, the novelty of the paper is to formulate the video alignment to fold the spatial and temporal alignment into a single alignment framework. This simultaneously satisfies a frame-correspondence and frame-alignment similarity; exploiting the knowledge among neighbor frames by a standard pairwise Markov random field (MRF). This new formulation is able to handle the alignment of sequences recorded at different times by independent moving cameras that follows a similar trajectory, and also generalizes the particular cases that of fixed geometric transformation and/or linear temporal mapping. We conduct experiments on different scenarios such as sequences recorded simultaneously or by moving cameras to validate the robustness of the proposed approach. The proposed method provides the highest video alignment accuracy compared to the state-of-the-art methods on sequences recorded from vehicles driving along the same track at different times.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1520-9210	ISBN		Medium
Area		Expedition		Conference
Notes	ADAS			Approved	no
Call Number	Admin @ si @ DSL2013; ADAS @ adas @			Serial	2228
Permanent link to this record



Author	Jose Manuel Alvarez; Theo Gevers; Ferran Diego; Antonio Lopez
Title	Road Geometry Classification by Adaptative Shape Models			Type	Journal Article
Year	2013	Publication	IEEE Transactions on Intelligent Transportation Systems	Abbreviated Journal	TITS
Volume	14	Issue	1	Pages	459-468
Keywords	road detection
Abstract	Vision-based road detection is important for different applications in transportation, such as autonomous driving, vehicle collision warning, and pedestrian crossing detection. Common approaches to road detection are based on low-level road appearance (e.g., color or texture) and neglect of the scene geometry and context. Hence, using only low-level features makes these algorithms highly depend on structured roads, road homogeneity, and lighting conditions. Therefore, the aim of this paper is to classify road geometries for road detection through the analysis of scene composition and temporal coherence. Road geometry classification is proposed by building corresponding models from training images containing prototypical road geometries. We propose adaptive shape models where spatial pyramids are steered by the inherent spatial structure of road images. To reduce the influence of lighting variations, invariant features are used. Large-scale experiments show that the proposed road geometry classifier yields a high recognition rate of 73.57% ± 13.1, clearly outperforming other state-of-the-art methods. Including road shape information improves road detection results over existing appearance-based methods. Finally, it is shown that invariant features and temporal information provide robustness against disturbing imaging conditions.
Address
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1524-9050	ISBN		Medium
Area		Expedition		Conference
Notes	ADAS;ISE			Approved	no
Call Number	Admin @ si @ AGD2013;; ADAS @ adas @			Serial	2269
Permanent link to this record



Author	Javier Marin; David Vazquez; Antonio Lopez; Jaume Amores; Bastian Leibe
Title	Random Forests of Local Experts for Pedestrian Detection			Type	Conference Article
Year	2013	Publication	15th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	2592 - 2599
Keywords	ADAS; Random Forest; Pedestrian Detection
Abstract	Pedestrian detection is one of the most challenging tasks in computer vision, and has received a lot of attention in the last years. Recently, some authors have shown the advantages of using combinations of part/patch-based detectors in order to cope with the large variability of poses and the existence of partial occlusions. In this paper, we propose a pedestrian detection method that efficiently combines multiple local experts by means of a Random Forest ensemble. The proposed method works with rich block-based representations such as HOG and LBP, in such a way that the same features are reused by the multiple local experts, so that no extra computational cost is needed with respect to a holistic method. Furthermore, we demonstrate how to integrate the proposed approach with a cascaded architecture in order to achieve not only high accuracy but also an acceptable efficiency. In particular, the resulting detector operates at five frames per second using a laptop machine. We tested the proposed method with well-known challenging datasets such as Caltech, ETH, Daimler, and INRIA. The method proposed in this work consistently ranks among the top performers in all the datasets, being either the best method or having a small difference with the best one.
Address	Sydney; Australia; December 2013
Corporate Author				Thesis
Publisher	IEEE	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1550-5499	ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	ADAS; 600.057; 600.054			Approved	no
Call Number	ADAS @ adas @ MVL2013			Serial	2333
Permanent link to this record



Author	Jon Almazan; Albert Gordo; Alicia Fornes; Ernest Valveny
Title	Handwritten Word Spotting with Corrected Attributes			Type	Conference Article
Year	2013	Publication	15th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	1017-1024
Keywords
Abstract	We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results.
Address	Sydney; Australia; December 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1550-5499	ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	DAG			Approved	no
Call Number	Admin @ si @ AGF2013			Serial	2327
Permanent link to this record



Author	Gemma Roig; Xavier Boix; R. de Nijs; Sebastian Ramos; K. Kühnlenz; Luc Van Gool
Title	Active MAP Inference in CRFs for Efficient Semantic Segmentation			Type	Conference Article
Year	2013	Publication	15th IEEE International Conference on Computer Vision	Abbreviated Journal
Volume		Issue		Pages	2312 - 2319
Keywords	Semantic Segmentation
Abstract	Most MAP inference algorithms for CRFs optimize an energy function knowing all the potentials. In this paper, we focus on CRFs where the computational cost of instantiating the potentials is orders of magnitude higher than MAP inference. This is often the case in semantic image segmentation, where most potentials are instantiated by slow classifiers fed with costly features. We introduce Active MAP inference 1) to on-the-fly select a subset of potentials to be instantiated in the energy function, leaving the rest of the parameters of the potentials unknown, and 2) to estimate the MAP labeling from such incomplete energy function. Results for semantic segmentation benchmarks, namely PASCAL VOC 2010 [5] and MSRC-21 [19], show that Active MAP inference achieves similar levels of accuracy but with major efficiency gains.
Address	Sydney; Australia; December 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1550-5499	ISBN		Medium
Area		Expedition		Conference	ICCV
Notes	ADAS; 600.057			Approved	no
Call Number	ADAS @ adas @ RBN2013			Serial	2377
Permanent link to this record



Author	Jorge Bernal; F. Javier Sanchez; Fernando Vilariño
Title	Impact of Image Preprocessing Methods on Polyp Localization in Colonoscopy Frames			Type	Conference Article
Year	2013	Publication	35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society	Abbreviated Journal
Volume		Issue		Pages	7350 - 7354
Keywords
Abstract	In this paper we present our image preprocessing methods as a key part of our automatic polyp localization scheme. These methods are used to assess the impact of different endoluminal scene elements when characterizing polyps. More precisely we tackle the influence of specular highlights, blood vessels and black mask surrounding the scene. Experimental results prove that the appropriate handling of these elements leads to a great improvement in polyp localization results.
Address	Osaka; Japan; July 2013
Corporate Author				Thesis
Publisher		Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1557-170X	ISBN		Medium
Area	800	Expedition		Conference	EMBC
Notes	MV; 600.047; 600.060;SIAI			Approved	no
Call Number	Admin @ si @ BSV2013			Serial	2286
Permanent link to this record



Author	Mariella Dimiccoli; Benoît Girard; Alain Berthoz; Daniel Bennequin
Title	Striola Magica: a functional explanation of otolith organs			Type	Journal Article
Year	2013	Publication	Journal of Computational Neuroscience	Abbreviated Journal	JCN
Volume	35	Issue	2	Pages	125-154
Keywords	Otolith organs ;Striola; Vestibular pathway
Abstract	Otolith end organs of vertebrates sense linear accelerations of the head and gravitation. The hair cells on their epithelia are responsible for transduction. In mammals, the striola, parallel to the line where hair cells reverse their polarization, is a narrow region centered on a curve with curvature and torsion. It has been shown that the striolar region is functionally different from the rest, being involved in a phasic vestibular pathway. We propose a mathematical and computational model that explains the necessity of this amazing geometry for the striola to be able to carry out its function. Our hypothesis, related to the biophysics of the hair cells and to the physiology of their afferent neurons, is that striolar afferents collect information from several type I hair cells to detect the jerk in a large domain of acceleration directions. This predicts a mean number of two calyces for afferent neurons, as measured in rodents. The domain of acceleration directions sensed by our striolar model is compatible with the experimental results obtained on monkeys considering all afferents. Therefore, the main result of our study is that phasic and tonic vestibular afferents cover the same geometrical fields, but at different dynamical and frequency domains.
Address
Corporate Author				Thesis
Publisher	Springer US	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1573-6873. 2013	ISBN		Medium
Area		Expedition		Conference
Notes	MILAB			Approved	no
Call Number	Admin @ si @DBG2013			Serial	2787
Permanent link to this record



Author	Ernest Valveny; Oriol Ramos Terrades; Joan Mas; Marçal Rusiñol
Title	Interactive Document Retrieval and Classification.			Type	Book Chapter
Year	2013	Publication	Multimodal Interaction in Image and Video Applications	Abbreviated Journal
Volume	48	Issue		Pages	17-30
Keywords
Abstract	In this chapter we describe a system for document retrieval and classification following the interactive-predictive framework. In particular, the system addresses two different scenarios of document analysis: document classification based on visual appearance and logo detection. These two classical problems of document analysis are formulated following the interactive-predictive model, taking the user interaction into account to make easier the process of annotating and labelling the documents. A system implementing this model in a real scenario is presented and analyzed. This system also takes advantage of active learning techniques to speed up the task of labelling the documents.
Address
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor	Angel Sappa; Jordi Vitria
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1868-4394	ISBN	978-3-642-35931-6	Medium
Area		Expedition		Conference
Notes	DAG			Approved	no
Call Number	Admin @ si @ VRM2013			Serial	2341
Permanent link to this record



Author	Joost Van de Weijer; Fahad Shahbaz Khan; Marc Masana
Title	Interactive Visual and Semantic Image Retrieval			Type	Book Chapter
Year	2013	Publication	Multimodal Interaction in Image and Video Applications	Abbreviated Journal
Volume	48	Issue		Pages	31-35
Keywords
Abstract	One direct consequence of recent advances in digital visual data generation and the direct availability of this information through the World-Wide Web, is a urgent demand for efficient image retrieval systems. The objective of image retrieval is to allow users to efficiently browse through this abundance of images. Due to the non-expert nature of the majority of the internet users, such systems should be user friendly, and therefore avoid complex user interfaces. In this chapter we investigate how high-level information provided by recently developed object recognition techniques can improve interactive image retrieval. Wel apply a bagof- word based image representation method to automatically classify images in a number of categories. These additional labels are then applied to improve the image retrieval system. Next to these high-level semantic labels, we also apply a low-level image description to describe the composition and color scheme of the scene. Both descriptions are incorporated in a user feedback image retrieval setting. The main objective is to show that automatic labeling of images with semantic labels can improve image retrieval results.
Address
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor	Angel Sappa; Jordi Vitria
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1868-4394	ISBN	978-3-642-35931-6	Medium
Area		Expedition		Conference
Notes	CIC; 605.203; 600.048			Approved	no
Call Number	Admin @ si @ WKC2013			Serial	2284
Permanent link to this record



Author	Abel Gonzalez-Garcia; Robert Benavente; Olivier Penacchio; Javier Vazquez; Maria Vanrell; C. Alejandro Parraga
Title	Coloresia: An Interactive Colour Perception Device for the Visually Impaired			Type	Book Chapter
Year	2013	Publication	Multimodal Interaction in Image and Video Applications	Abbreviated Journal
Volume	48	Issue		Pages	47-66
Keywords
Abstract	A significative percentage of the human population suffer from impairments in their capacity to distinguish or even see colours. For them, everyday tasks like navigating through a train or metro network map becomes demanding. We present a novel technique for extracting colour information from everyday natural stimuli and presenting it to visually impaired users as pleasant, non-invasive sound. This technique was implemented inside a Personal Digital Assistant (PDA) portable device. In this implementation, colour information is extracted from the input image and categorised according to how human observers segment the colour space. This information is subsequently converted into sound and sent to the user via speakers or headphones. In the original implementation, it is possible for the user to send its feedback to reconfigure the system, however several features such as these were not implemented because the current technology is limited.We are confident that the full implementation will be possible in the near future as PDA technology improves.
Address
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1868-4394	ISBN	978-3-642-35931-6	Medium
Area		Expedition		Conference
Notes	CIC; 600.052; 605.203			Approved	no
Call Number	Admin @ si @ GBP2013			Serial	2266
Permanent link to this record



Author	Michal Drozdzal; Santiago Segui; Petia Radeva; Carolina Malagelada; Fernando Azpiroz; Jordi Vitria
Title	An Application for Efﬁcient Error-Free Labeling of Medical Images			Type	Book Chapter
Year	2013	Publication	Multimodal Interaction in Image and Video Applications	Abbreviated Journal
Volume	48	Issue		Pages	1-16
Keywords
Abstract	In this chapter we describe an application for efficient error-free labeling of medical images. In this scenario, the compilation of a complete training set for building a realistic model of a given class of samples is not an easy task, making the process tedious and time consuming. For this reason, there is a need for interactive labeling applications that minimize the effort of the user while providing error-free labeling. We propose a new algorithm that is based on data similarity in feature space. This method actively explores data in order to find the best label-aligned clustering and exploits it to reduce the labeler effort, that is measured by the number of “clicks. Moreover, error-free labeling is guaranteed by the fact that all data and their labels proposals are visually revised by en expert.
Address
Corporate Author				Thesis
Publisher	Springer Berlin Heidelberg	Place of Publication		Editor
Language		Summary Language		Original Title
Series Editor		Series Title		Abbreviated Series Title
Series Volume		Series Issue		Edition
ISSN	1868-4394	ISBN	978-3-642-35931-6	Medium
Area		Expedition		Conference
Notes	MILAB; OR;MV			Approved	no
Call Number	Admin @ si @ DSR2013			Serial	2235
Permanent link to this record