|   | 
Details
   web
Records
Author Hongxing Gao; Marçal Rusiñol; Dimosthenis Karatzas; Josep Llados
Title Fast Structural Matching for Document Image Retrieval through Spatial Databases Type Conference Article
Year 2014 Publication Document Recognition and Retrieval XXI Abbreviated Journal
Volume 9021 Issue Pages (down)
Keywords Document image retrieval; distance transform; MSER; spatial database
Abstract The structure of document images plays a signi cant role in document analysis thus considerable e orts have been made towards extracting and understanding document structure, usually in the form of layout analysis approaches. In this paper, we rst employ Distance Transform based MSER (DTMSER) to eciently extract stable document structural elements in terms of a dendrogram of key-regions. Then a fast structural matching method is proposed to query the structure of document (dendrogram) based on a spatial database which facilitates the formulation of advanced spatial queries. The experiments demonstrate a signi cant improvement in a document retrieval scenario when compared to the use of typical Bag of Words (BoW) and pyramidal BoW descriptors.
Address Amsterdam; September 2014
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference SPIE-DRR
Notes DAG; 600.056; 600.061; 600.077 Approved no
Call Number Admin @ si @ GRK2014a Serial 2496
Permanent link to this record
 

 
Author Joan M. Nuñez; Jorge Bernal; Miquel Ferrer; Fernando Vilariño
Title Impact of Keypoint Detection on Graph-based Characterization of Blood Vessels in Colonoscopy Videos Type Conference Article
Year 2014 Publication CARE workshop Abbreviated Journal
Volume Issue Pages (down)
Keywords Colonoscopy; Graph Matching; Biometrics; Vessel; Intersection
Abstract We explore the potential of the use of blood vessels as anatomical landmarks for developing image registration methods in colonoscopy images. An unequivocal representation of blood vessels could be used to guide follow-up methods to track lesions over different interventions. We propose a graph-based representation to characterize network structures, such as blood vessels, based on the use of intersections and endpoints. We present a study consisting of the assessment of the minimal performance a keypoint detector should achieve so that the structure can still be recognized. Experimental results prove that, even by achieving a loss of 35% of the keypoints, the descriptive power of the associated graphs to the vessel pattern is still high enough to recognize blood vessels.
Address Boston; USA; September 2014
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CARE
Notes MV; DAG; 600.060; 600.047; 600.077;SIAI Approved no
Call Number Admin @ si @ NBF2014 Serial 2504
Permanent link to this record
 

 
Author Adriana Romero; Carlo Gatta; Gustavo Camps-Valls
Title Unsupervised Deep Feature Extraction Of Hyperspectral Images Type Conference Article
Year 2014 Publication 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing Abbreviated Journal
Volume Issue Pages (down)
Keywords Convolutional networks; deep learning; sparse learning; feature extraction; hyperspectral image classification
Abstract This paper presents an effective unsupervised sparse feature learning algorithm to train deep convolutional networks on hyperspectral images. Deep convolutional hierarchical representations are learned and then used for pixel classification. Features in lower layers present less abstract representations of data, while higher layers represent more abstract and complex characteristics. We successfully illustrate the performance of the extracted representations in a challenging AVIRIS hyperspectral image classification problem, compared to standard dimensionality reduction methods like principal component analysis (PCA) and its kernel counterpart (kPCA). The proposed method largely outperforms the previous state-ofthe-art results on the same experimental setting. Results show that single layer networks can extract powerful discriminative features only when the receptive field accounts for neighboring pixels. Regarding the deep architecture, we can conclude that: (1) additional layers in a deep architecture significantly improve the performance w.r.t. single layer variants; (2) the max-pooling step in each layer is mandatory to achieve satisfactory results; and (3) the performance gain w.r.t. the number of layers is upper bounded, since the spatial resolution is reduced at each pooling, resulting in too spatially coarse output features.
Address Lausanne; Switzerland; June 2014
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WHISPERS
Notes MILAB; LAMP; 600.079 Approved no
Call Number Admin @ si @ RGC2014 Serial 2513
Permanent link to this record
 

 
Author Victor Ponce; Mario Gorga; Xavier Baro; Petia Radeva; Sergio Escalera
Title Análisis de la expresión oral y gestual en proyectos fin de carrera vía un sistema de visión artificial Type Journal Article
Year 2011 Publication ReVisión Abbreviated Journal
Volume 4 Issue 1 Pages (down)
Keywords
Abstract La comunicación y expresión oral es una competencia de especial relevancia en el EEES. No obstante, en muchas enseñanzas superiores la puesta en práctica de esta competencia ha sido relegada principalmente a la presentación de proyectos fin de carrera. Dentro de un proyecto de innovación docente, se ha desarrollado una herramienta informática para la extracción de información objetiva para el análisis de la expresión oral y gestual de los alumnos. El objetivo es dar un “feedback” a los estudiantes que les permita mejorar la calidad de sus presentaciones. El prototipo inicial que se presenta en este trabajo permite extraer de forma automática información audiovisual y analizarla mediante técnicas de aprendizaje. El sistema ha sido aplicado a 15 proyectos fin de carrera y 15 exposiciones dentro de una asignatura de cuarto curso. Los resultados obtenidos muestran la viabilidad del sistema para sugerir factores que ayuden tanto en el éxito de la comunicación así como en los criterios de evaluación.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1989-1199 ISBN Medium
Area Expedition Conference
Notes HuPBA; MILAB;MV Approved no
Call Number Admin @ si @ PGB2011d Serial 2514
Permanent link to this record
 

 
Author Antonio Hernandez; Stan Sclaroff; Sergio Escalera
Title Contextual rescoring for Human Pose Estimation Type Conference Article
Year 2014 Publication 25th British Machine Vision Conference Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract A contextual rescoring method is proposed for improving the detection of body joints of a pictorial structure model for human pose estimation. A set of mid-level parts is incorporated in the model, and their detections are used to extract spatial and score-related features relative to other body joint hypotheses. A technique is proposed for the automatic discovery of a compact subset of poselets that covers a set of validation images
while maximizing precision. A rescoring mechanism is defined as a set-based boosting classifier that computes a new score for body joint detections, given its relationship to detections of other body joints and mid-level parts in the image. This new score complements the unary potential of a discriminatively trained pictorial structure model. Experiments on two benchmarks show performance improvements when considering the proposed mid-level image representation and rescoring approach in comparison with other pictorial structure-based approaches.
Address Nottingham; UK; September 2013
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference BMVC
Notes HuPBA;MILAB Approved no
Call Number HSE2014 Serial 2525
Permanent link to this record
 

 
Author Cristhian A. Aguilera-Carrasco
Title Evaluation of feature detectors and descriptors in VISIBLE-LWIR cross-spectral imaging Type Report
Year 2014 Publication CVC Technical Report Abbreviated Journal
Volume 177 Issue Pages (down)
Keywords Multi-spectral; Cross-spectral; Visible-LWIR imaging; Multimodal.
Abstract This thesis evaluates the performance of different state-of-art feature detectors and descriptors algorithms in the Visible-LWIR cross-spectral scenario. The focus is to determine if current detector and descriptor algorithms can be used to match features between the LWIR spectrum and the visible spectrum in applications such as, visual odometry, object recognition, image registration and stereo vision. An outdoor cross-spectral dataset was created to evaluate the suitability of the different algorithms. The results
show that the tested algorithms are not suitable to the task of matching features across different spectra. The repeatability ratio was smaller than the 30 percent in the best case and in general matched features were not accurate located. Additionally, these results also suggest that is necessary to create new algorithms that take into account the nature of the different spectra, describing characteristics that exist in both spectra such as discontinuities.
Address
Corporate Author Thesis Master's thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @Agu2014 Serial 2526
Permanent link to this record
 

 
Author Lluis Gomez; Dimosthenis Karatzas
Title Scene Text Recognition: No Country for Old Men? Type Conference Article
Year 2014 Publication 1st International Workshop on Robust Reading Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference IWRR
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ GoK2014c Serial 2538
Permanent link to this record
 

 
Author Jiaolong Xu; Sebastian Ramos; David Vazquez; Antonio Lopez
Title DA-DPM Pedestrian Detection Type Conference Article
Year 2013 Publication ICCV Workshop on Reconstruction meets Recognition Abbreviated Journal
Volume Issue Pages (down)
Keywords Domain Adaptation; Pedestrian Detection
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICCVW-RR
Notes ADAS Approved no
Call Number Admin @ si @ XRV2013 Serial 2569
Permanent link to this record
 

 
Author Antonio Clavelli
Title A computational model of eye guidance, searching for text in real scene images Type Book Whole
Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract Searching for text objects in real scene images is an open problem and a very active computer vision research area. A large number of methods have been proposed tackling the text search as extension of the ones from the document analysis field or inspired by general purpose object detection methods. However the general problem of object search in real scene images remains an extremely challenging problem due to the huge variability in object appearance. This thesis builds on top of the most recent findings in the visual attention literature presenting a novel computational model of eye guidance aiming to better describe text object search in real scene images.
First are presented the relevant state-of-the-art results from the visual attention literature regarding eye movements and visual search. Relevant models of attention are discussed and integrated with recent observations on the role of top-down constraints and the emerging need for a layered model of attention in which saliency is not the only factor guiding attention. Visual attention is then explained by the interaction of several modulating factors, such as objects, value, plans and saliency. Then we introduce our probabilistic formulation of attention deployment in real scene. The model is based on the rationale that oculomotor control depends on two interacting but distinct processes: an attentional process that assigns value to the sources of information and motor process that flexibly links information with action.
In such framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the reward of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects.
In the experimental section the model is tested in laboratory condition, comparing model simulations with data from eye tracking experiments. The comparison is qualitative in terms of observable scan paths and quantitative in terms of statistical similarity of gaze shift amplitude. Experiments are performed using eye tracking data from both a publicly available dataset of face and text and from newly performed eye-tracking experiments on a dataset of street view pictures containing text. The last part of this thesis is dedicated to study the extent to which the proposed model can account for human eye movements in a low constrained setting. We used a mobile eye tracking device and an ad-hoc developed methodology to compare model simulated eye data with the human eye data from mobile eye tracking recordings. Such setting allow to test the model in an incomplete visual information condition, reproducing a close to real-life search task.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Dimosthenis Karatzas;Giuseppe Boccignone;Josep Llados
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-6-4 Medium
Area Expedition Conference
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ Cla2014 Serial 2571
Permanent link to this record
 

 
Author Jon Almazan
Title Learning to Represent Handwritten Shapes and Words for Matching and Recognition Type Book Whole
Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract Writing is one of the most important forms of communication and for centuries, handwriting had been the most reliable way to preserve knowledge. However, despite the recent development of printing houses and electronic devices, handwriting is still broadly used for taking notes, doing annotations, or sketching ideas.
Transferring the ability of understanding handwritten text or recognizing handwritten shapes to computers has been the goal of many researches due to its huge importance for many different fields. However, designing good representations to deal with handwritten shapes, e.g. symbols or words, is a very challenging problem due to the large variability of these kinds of shapes. One of the consequences of working with handwritten shapes is that we need representations to be robust, i.e., able to adapt to large intra-class variability. We need representations to be discriminative, i.e., able to learn what are the differences between classes. And, we need representations to be efficient, i.e., able to be rapidly computed and compared. Unfortunately, current techniques of handwritten shape representation for matching and recognition do not fulfill some or all of these requirements.
Through this thesis we focus on the problem of learning to represent handwritten shapes aimed at retrieval and recognition tasks. Concretely, on the first part of the thesis, we focus on the general problem of representing any kind of handwritten shape. We first present a novel shape descriptor based on a deformable grid that deals with large deformations by adapting to the shape and where the cells of the grid can be used to extract different features. Then, we propose to use this descriptor to learn statistical models, based on the Active Appearance Model, that jointly learns the variability in structure and texture of a given class. Then, on the second part, we focus on a concrete application, the problem of representing handwritten words, for the tasks of word spotting, where the goal is to find all instances of a query word in a dataset of images, and recognition. First, we address the segmentation-free problem and propose an unsupervised, sliding-window-based approach that achieves state-of- the-art results in two public datasets. Second, we address the more challenging multi-writer problem, where the variability in words exponentially increases. We describe an approach in which both word images and text strings are embedded in a common vectorial subspace, and where those that represent the same word are close together. This is achieved by a combination of label embedding and attributes learning, and a common subspace regression. This leads to a low-dimensional, unified representation of word images and strings, resulting in a method that allows one to perform either image and text searches, as well as image transcription, in a unified framework. We evaluate our methods on different public datasets of both handwritten documents and natural images showing results comparable or better than the state-of-the-art on spotting and recognition tasks.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Ernest Valveny;Alicia Fornes
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ Alm2014 Serial 2572
Permanent link to this record
 

 
Author David Fernandez
Title Contextual Word Spotting in Historical Handwritten Documents Type Book Whole
Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract There are countless collections of historical documents in archives and libraries that contain plenty of valuable information for historians and researchers. The extraction of this information has become a central task among the Document Analysis researches and practitioners.
There is an increasing interest to digital preserve and provide access to these kind of documents. But only the digitalization is not enough for the researchers. The extraction and/or indexation of information of this documents has had an increased interest among researchers. In many cases, and in particular in historical manuscripts, the full transcription of these documents is extremely dicult due the inherent de ciencies: poor physical preservation, di erent writing styles, obsolete languages, etc. Word spotting has become a popular an ecient alternative to full transcription. It inherently involves a high level of degradation in the images. The search of words is holistically
formulated as a visual search of a given query shape in a larger image, instead of recognising the input text and searching the query word with an ascii string comparison. But the performance of classical word spotting approaches depend on the degradation level of the images being unacceptable in many cases . In this thesis we have proposed a novel paradigm called contextual word spotting method that uses the contextual/semantic information to achieve acceptable results whereas classical word spotting does not reach. The contextual word spotting framework proposed in this thesis is a segmentation-based word spotting approach, so an ecient word segmentation is needed. Historical handwritten
documents present some common diculties that can increase the diculties the extraction of the words. We have proposed a line segmentation approach that formulates the problem as nding the central part path in the area between two consecutive lines. This is solved as a graph traversal problem. A path nding algorithm is used to nd the optimal path in a graph, previously computed, between the text lines. Once the text lines are extracted, words are localized inside the text lines using a word segmentation technique from the state of the
art. Classical word spotting approaches can be improved using the contextual information of the documents. We have introduced a new framework, oriented to handwritten documents that present a highly structure, to extract information making use of context. The framework is an ecient tool for semi-automatic transcription that uses the contextual information to achieve better results than classical word spotting approaches. The contextual information is
automatically discovered by recognizing repetitive structures and categorizing all the words according to semantic classes. The most frequent words in each semantic cluster are extracted and the same text is used to transcribe all them. The experimental results achieved in this thesis outperform classical word spotting approaches demonstrating the suitability of the proposed ensemble architecture for spotting words in historical handwritten documents using contextual information.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Josep Llados;Alicia Fornes
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-7-1 Medium
Area Expedition Conference
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ Fer2014 Serial 2573
Permanent link to this record
 

 
Author Lluis Pere de las Heras
Title Relational Models for Visual Understanding of Graphical Documents. Application to Architectural Drawings. Type Book Whole
Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract Graphical documents express complex concepts using a visual language. This language consists of a vocabulary (symbols) and a syntax (structural relations between symbols) that articulate a semantic meaning in a certain context. Therefore, the automatic interpretation by computers of these sort of documents entails three main steps: the detection of the symbols, the extraction of the structural relations between these symbols, and the modeling of the knowledge that permits the extraction of the semantics. Di erent domains in graphical documents include: architectural and engineering drawings, maps, owcharts, etc.
Graphics Recognition in particular and Document Image Analysis in general are
born from the industrial need of interpreting a massive amount of digitalized documents after the emergence of the scanner. Although many years have passed, the graphical document understanding problem still seems to be far from being solved. The main reason is that the vast majority of the systems in the literature focus on very speci c problems, where the domain of the document dictates the implementation of the interpretation. As a result, it is dicult to reuse these strategies on di erent data and on di erent contexts, hindering thus the natural progress in the eld.
In this thesis, we face the graphical document understanding problem by proposing several relational models at di erent levels that are designed from a generic perspective. Firstly, we introduce three di erent strategies for the detection of symbols. The fi rst method tackles the problem structurally, wherein general knowledge of the domain guides the detection. The second is a statistical method that learns the graphical appearance of the symbols and easily adapts to the big variability of the problem. The third method is a combination of the previous two methods that inherits their respective strengths, i.e. copes the big variability and does not need annotated data. Secondly, we present two relational strategies that tackle the problem of the visual context extraction. The fi rst one is a full bottom up method that heuristically searches in a graph representation the contextual relations between symbols. Contrarily, the second is syntactic method that models probabilistically the structure of the documents. It automatically learns the model, which guides the inference algorithm to encounter the best structural representation for a given input. Finally, we construct a knowledge-based model consisting of an ontological de nition of the domain and real data. This model permits to perform contextual reasoning and to detect semantic inconsistencies within the data. We evaluate the suitability of the proposed contributions in the framework of floor plan interpretation. Since there is no standard in the modeling of these documents there exists an enormous notation variability from plan to plan in terms of vocabulary and syntax. Therefore, floor plan interpretation is a relevant task in the graphical document understanding problem. It is also worth to mention that we make freely available all the resources used in this thesis {the data, the tool used to generate the data, and the evaluation scripts{ with the aim of fostering research in the graphical document understanding task.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Gemma Sanchez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-8-8 Medium
Area Expedition Conference
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ Her2014 Serial 2574
Permanent link to this record
 

 
Author Carles Sanchez
Title Tracheal Structure Characterization using Geometric and Appearance Models for Efficient Assessment of Stenosis in Videobronchoscopy Type Book Whole
Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract Recent advances in endoscopic devices have increased their use for minimal invasive diagnostic and intervention procedures. Among all endoscopic modalities, bronchoscopy is one of the most frequent with around 261 millions of procedures per year. Although the use of bronchoscopy is spread among clinical facilities it presents some drawbacks, being the visual inspection for the assessment of anatomical measurements the most prevalent of them. In
particular, inaccuracies in the estimation of the degree of stenosis (the percentage of obstructed airway) decreases its diagnostic yield and might lead to erroneous treatments. An objective computation of tracheal stenosis in bronchoscopy videos would constitute a breakthrough for this non-invasive technique and a reduction in treatment cost.
This thesis settles the first steps towards on-line reliable extraction of anatomical information from videobronchoscopy for computation of objective measures. In particular, we focus on the computation of the degree of stenosis, which is obtained by comparing the area delimited by a healthy tracheal ring and the stenosed lumen. Reliable extraction of airway structures in interventional videobronchoscopy is a challenging task. This is mainly due to the large variety of acquisition conditions (positions and illumination), devices (different digitalizations) and in videos acquired at the operating room the unpredicted presence of surgical devices (such as probe ends). This thesis contributes to on-line stenosis assessment in several ways. We
propose a parametric strategy for the extraction of lumen and tracheal rings regions based on the characterization of their geometry and appearance that guide a deformable model. The geometric and appearance characterization is based on a physical model describing the way bronchoscopy images are obtained and includes local and global descriptions. In order to ensure a systematic applicability we present a statistical framework to select the optimal
parameters of our method. Experiments perform on the first public annotated database, show that the performance of our method is comparable to the one provided by clinicians and its computation time allows for a on-line implementation in the operating room.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor F. Javier Sanchez;Debora Gil;Jorge Bernal
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-9-5 Medium
Area Expedition Conference
Notes IAM; 600.075 Approved no
Call Number Admin @ si @ San2014 Serial 2575
Permanent link to this record
 

 
Author Antonio Hernandez
Title From pixels to gestures: learning visual representations for human analysis in color and depth data sequences Type Book Whole
Year 2015 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract The visual analysis of humans from images is an important topic of interest due to its relevance to many computer vision applications like pedestrian detection, monitoring and surveillance, human-computer interaction, e-health or content-based image retrieval, among others.

In this dissertation we are interested in learning different visual representations of the human body that are helpful for the visual analysis of humans in images and video sequences. To that end, we analyze both RGB and depth image modalities and address the problem from three different research lines, at different levels of abstraction; from pixels to gestures: human segmentation, human pose estimation and gesture recognition.

First, we show how binary segmentation (object vs. background) of the human body in image sequences is helpful to remove all the background clutter present in the scene. The presented method, based on Graph cuts optimization, enforces spatio-temporal consistency of the produced segmentation masks among consecutive frames. Secondly, we present a framework for multi-label segmentation for obtaining much more detailed segmentation masks: instead of just obtaining a binary representation separating the human body from the background, finer segmentation masks can be obtained separating the different body parts.

At a higher level of abstraction, we aim for a simpler yet descriptive representation of the human body. Human pose estimation methods usually rely on skeletal models of the human body, formed by segments (or rectangles) that represent the body limbs, appropriately connected following the kinematic constraints of the human body. In practice, such skeletal models must fulfill some constraints in order to allow for efficient inference, while actually limiting the expressiveness of the model. In order to cope with this, we introduce a top-down approach for predicting the position of the body parts in the model, using a mid-level part representation based on Poselets.

Finally, we propose a framework for gesture recognition based on the bag of visual words framework. We leverage the benefits of RGB and depth image modalities by combining modality-specific visual vocabularies in a late fusion fashion. A new rotation-variant depth descriptor is presented, yielding better results than other state-of-the-art descriptors. Moreover, spatio-temporal pyramids are used to encode rough spatial and temporal structure. In addition, we present a probabilistic reformulation of Dynamic Time Warping for gesture segmentation in video sequences. A Gaussian-based probabilistic model of a gesture is learnt, implicitly encoding possible deformations in both spatial and time domains.
Address January 2015
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Sergio Escalera;Stan Sclaroff
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-0-2 Medium
Area Expedition Conference
Notes HuPBA;MILAB Approved no
Call Number Admin @ si @ Her2015 Serial 2576
Permanent link to this record
 

 
Author Hongxing Gao
Title Focused Structural Document Image Retrieval in Digital Mailroom Applications Type Book Whole
Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages (down)
Keywords
Abstract In this work, we develop a generic framework that is able to handle the document retrieval problem in various scenarios such as searching for full page matches or retrieving the counterparts for specific document areas, focusing on their structural similarity or letting their visual resemblance to play a dominant role. Based on the spatial indexing technique, we propose to search for matches of local key-region pairs carrying both structural and visual information from the collection while a scheme allowing to adjust the relative contribution of structural and visual similarity is presented.
Based on the fact that the structure of documents is tightly linked with the distance among their elements, we firstly introduce an efficient detector named Distance Transform based Maximally Stable Extremal Regions (DTMSER). We illustrate that this detector is able to efficiently extract the structure of a document image as a dendrogram (hierarchical tree) of multi-scale key-regions that roughly correspond to letters, words and paragraphs. We demonstrate that, without benefiting from the structure information, the key-regions extracted by the DTMSER algorithm achieve better results comparing with state-of-the-art methods while much less amount of key-regions are employed.
We subsequently propose a pair-wise Bag of Words (BoW) framework to efficiently embed the explicit structure extracted by the DTMSER algorithm. We represent each document as a list of key-region pairs that correspond to the edges in the dendrogram where inclusion relationship is encoded. By employing those structural key-region pairs as the pooling elements for generating the histogram of features, the proposed method is able to encode the explicit inclusion relations into a BoW representation. The experimental results illustrate that the pair-wise BoW, powered by the embedded structural information, achieves remarkable improvement over the conventional BoW and spatial pyramidal BoW methods.
To handle various retrieval scenarios in one framework, we propose to directly query a series of key-region pairs, carrying both structure and visual information, from the collection. We introduce the spatial indexing techniques to the document retrieval community to speed up the structural relationship computation for key-region pairs. We firstly test the proposed framework in a full page retrieval scenario where structurally similar matches are expected. In this case, the pair-wise querying method achieves notable improvement over the BoW and spatial pyramidal BoW frameworks. Furthermore, we illustrate that the proposed method is also able to handle focused retrieval situations where the queries are defined as a specific interesting partial areas of the images. We examine our method on two types of focused queries: structure-focused and exact queries. The experimental results show that, the proposed generic framework obtains nearly perfect precision on both types of focused queries while it is the first framework able to tackle structure-focused queries, setting a new state of the art in the field.
Besides, we introduce a line verification method to check the spatial consistency among the matched key-region pairs. We propose a computationally efficient version of line verification through a two step implementation. We first compute tentative localizations of the query and subsequently employ them to divide the matched key-region pairs into several groups, then line verification is performed within each group while more precise bounding boxes are computed. We demonstrate that, comparing with the standard approach (based on RANSAC), the line verification proposed generally achieves much higher recall with slight loss on precision on specific queries.
Address January 2015
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Josep Llados;Dimosthenis Karatzas;Marçal Rusiñol
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-943427-0-7 Medium
Area Expedition Conference
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ Gao2015 Serial 2577
Permanent link to this record