toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Wenjuan Gong edit  openurl
  Title 3D Motion Data aided Human Action Recognition and Pose Estimation Type Book Whole
  Year 2013 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In this work, we explore human action recognition and pose estimation prob-
lems. Different from traditional works of learning from 2D images or video
sequences and their annotated output, we seek to solve the problems with ad-
ditional 3D motion capture information, which helps to fill the gap between 2D
image features and human interpretations.
We first compare two different schools of approaches commonly used for 3D
pose estimation from 2D pose configuration: modeling and learning methods.
By looking into experiments results and considering our problems, we fixed a
learning method as the following approaches to do pose estimation. We then
establish a framework by adding a module of detecting 2D pose configuration
from images with varied background, which widely extend the application of
the approach. We also seek to directly estimate 3D poses from image features,
instead of estimating 2D poses as a intermediate module. We explore a robust
input feature, which combined with the proposed distance measure, provides
a solution for noisy or corrupted inputs. We further utilize the above method
to estimate weak poses,which is a concise representation of the original poses
by using dimension deduction technologies, from image features. Weak pose
space is where we calculate vocabulary and label action types using a bog of
words pipeline. Temporal information of an action is taken into consideration by
considering several consecutive frames as a single unit for computing vocabulary
and histogram assignments.
 
  Address Barcelona  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Gon2013 Serial 2279  
Permanent link to this record
 

 
Author Murad Al Haj edit  openurl
  Title Looking at Faces: Detection, Tracking and Pose Estimation Type Book Whole
  Year 2013 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Humans can effortlessly perceive faces, follow them over space and time, and decode their rich content, such as pose, identity and expression. However, despite many decades of research on automatic facial perception in areas like face detection, expression recognition, pose estimation and face recognition, and despite many successes, a complete solution remains elusive. This thesis is dedicated to three problems in automatic face perception, namely face detection, face tracking and pose estimation.

In face detection, an initial simple model is presented that uses pixel-based heuristics to segment skin locations and hand-crafted rules to determine the locations of the faces present in an image. Different colorspaces are studied to judge whether a colorspace transformation can aid skin color detection. The output of this study is used in the design of a more complex face detector that is able to successfully generalize to different scenarios.

In face tracking, a framework that combines estimation and control in a joint scheme is presented to track a face with a single pan-tilt-zoom camera. While this work is mainly motivated by tracking faces, it can be easily applied atop of any detector to track different objects. The applicability of this method is demonstrated on simulated as well as real-life scenarios.

The last and most important part of this thesis is dedicate to monocular head pose estimation. In this part, a method based on partial least squares (PLS) regression is proposed to estimate pose and solve the alignment problem simultaneously. The contributions of this work are two-fold: 1) demonstrating that the proposed method achieves better than state-of-the-art results on the estimation problem and 2) developing a technique to reduce misalignment based on the learned PLS factors that outperform multiple instance learning (MIL) without the need for any re-training or the inclusion of misaligned samples in the training process, as normally done in MIL.
 
  Address Barcelona  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Haj2013 Serial 2278  
Permanent link to this record
 

 
Author Albert Gordo edit  openurl
  Title Document Image Representation, Classification and Retrieval in Large-Scale Domains Type Book Whole
  Year 2013 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Despite the “paperless office” ideal that started in the decade of the seventies, businesses still strive against an increasing amount of paper documentation. Companies still receive huge amounts of paper documentation that need to be analyzed and processed, mostly in a manual way. A solution for this task consists in, first, automatically scanning the incoming documents. Then, document images can be analyzed and information can be extracted from the data. Documents can also be automatically dispatched to the appropriate workflows, used to retrieve similar documents in the dataset to transfer information, etc.

Due to the nature of this “digital mailroom”, we need document representation methods to be general, i.e., able to cope with very different types of documents. We need the methods to be sound, i.e., able to cope with unexpected types of documents, noise, etc. And, we need to methods to be scalable, i.e., able to cope with thousands or millions of documents that need to be processed, stored, and consulted. Unfortunately, current techniques of document representation, classification and retrieval are not apt for this digital mailroom framework, since they do not fulfill some or all of these requirements.

Through this thesis we focus on the problem of document representation aimed at classification and retrieval tasks under this digital mailroom framework. We first propose a novel document representation based on runlength histograms, and extend it to cope with more complex documents such as multiple-page documents, or documents that contain more sources of information such as extracted OCR text. Then we focus on the scalability requirements and propose a novel binarization method which we dubbed PCAE, as well as two general asymmetric distances between binary embeddings that can significantly improve the retrieval results at a minimal extra computational cost. Finally, we note the importance of supervised learning when performing large-scale retrieval, and study several approaches that can significantly boost the results at no extra cost at query time.
 
  Address Barcelona  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Ernest Valveny;Florent Perronnin  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ Gor2013 Serial 2277  
Permanent link to this record
 

 
Author David Vazquez edit   pdf
isbn  openurl
  Title Domain Adaptation of Virtual and Real Worlds for Pedestrian Detection Type Book Whole
  Year 2013 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal  
  Volume 1 Issue 1 Pages 1-105  
  Keywords Pedestrian Detection; Domain Adaptation  
  Abstract Pedestrian detection is of paramount interest for many applications, e.g. Advanced Driver Assistance Systems, Intelligent Video Surveillance and Multimedia systems. Most promising pedestrian detectors rely on appearance-based classifiers trained with annotated data. However, the required annotation step represents an intensive and subjective task for humans, what makes worth to minimize their intervention in this process by using computational tools like realistic virtual worlds. The reason to use these kind of tools relies in the fact that they allow the automatic generation of precise and rich annotations of visual information. Nevertheless, the use of this kind of data comes with the following question: can a pedestrian appearance model learnt with virtual-world data work successfully for pedestrian detection in real-world scenarios?. To answer this question, we conduct different experiments that suggest a positive answer. However, the pedestrian classifiers trained with virtual-world data can suffer the so called dataset shift problem as real-world based classifiers does. Accordingly, we have designed different domain adaptation techniques to face this problem, all of them integrated in a same framework (V-AYLA). We have explored different methods to train a domain adapted pedestrian classifiers by collecting a few pedestrian samples from the target domain (real world) and combining them with many samples of the source domain (virtual world). The extensive experiments we present show that pedestrian detectors developed within the V-AYLA framework do achieve domain adaptation. Ideally, we would like to adapt our system without any human intervention. Therefore, as a first proof of concept we also propose an unsupervised domain adaptation technique that avoids human intervention during the adaptation process. To the best of our knowledge, this Thesis work is the first demonstrating adaptation of virtual and real worlds for developing an object detector. Last but not least, we also assessed a different strategy to avoid the dataset shift that consists in collecting real-world samples and retrain with them in such a way that no bounding boxes of real-world pedestrians have to be provided. We show that the generated classifier is competitive with respect to the counterpart trained with samples collected by manually annotating pedestrian bounding boxes. The results presented on this Thesis not only end with a proposal for adapting a virtual-world pedestrian detector to the real world, but also it goes further by pointing out a new methodology that would allow the system to adapt to different situations, which we hope will provide the foundations for future research in this unexplored area.  
  Address Barcelona  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Barcelona Editor Antonio Lopez;Daniel Ponsa  
  Language English Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940530-1-6 Medium  
  Area Expedition Conference  
  Notes adas Approved yes  
  Call Number ADAS @ adas @ Vaz2013 Serial 2276  
Permanent link to this record
 

 
Author Shida Beigpour edit  openurl
  Title Illumination and object reflectance modeling Type Book Whole
  Year 2013 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract More realistic and accurate models of the scene illumination and object reflectance can greatly improve the quality of many computer vision and computer graphics tasks. Using such model, a more profound knowledge about the interaction of light with object surfaces can be established which proves crucial to a variety of computer vision applications. In the current work, we investigate the various existing approaches to illumination and reflectance modeling and form an analysis on their shortcomings in capturing the complexity of real-world scenes. Based on this analysis we propose improvements to different aspects of reflectance and illumination estimation in order to more realistically model the real-world scenes in the presence of complex lighting phenomena (i.e, multiple illuminants, interreflections and shadows). Moreover, we captured our own multi-illuminant dataset which consists of complex scenes and illumination conditions both outdoor and in laboratory conditions. In addition we investigate the use of synthetic data to facilitate the construction of datasets and improve the process of obtaining ground-truth information.  
  Address Barcelona  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Ernest Valveny  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes CIC Approved no  
  Call Number Admin @ si @ Bei2013 Serial 2267  
Permanent link to this record
 

 
Author Francesco Ciompi edit  openurl
  Title Multi-Class Learning for Vessel Characterization in Intravascular Ultrasound Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In this thesis we tackle the problem of automatic characterization of human coronary vessel in Intravascular Ultrasound (IVUS) image modality. The basis for the whole characterization process is machine learning applied to multi-class problems. In all the presented approaches, the Error-Correcting Output Codes (ECOC) framework is used as central element for the design of multi-class classifiers.
Two main topics are tackled in this thesis. First, the automatic detection of the vessel borders is presented. For this purpose, a novel context-aware classifier for multi-class classification of the vessel morphology is presented, namely ECOC-DRF. Based on ECOC-DRF, the lumen border and the media-adventitia border in IVUS are robustly detected by means of a novel holistic approach, achieving an error comparable with inter-observer variability and with state of the art methods.
The two vessel borders define the atheroma area of the vessel. In this area, tissue characterization is required. For this purpose, we present a framework for automatic plaque characterization by processing both texture in IVUS images and spectral information in raw Radio Frequency data. Furthermore, a novel method for fusing in-vivo and in-vitro IVUS data for plaque characterization is presented, namely pSFFS. The method demonstrates to effectively fuse data generating a classifier that improves the tissue characterization in both in-vitro and in-vivo datasets.
A novel method for automatic video summarization in IVUS sequences is also presented. The method aims to detect the key frames of the sequence, i.e., the frames representative of morphological changes. This novel method represents the basis for video summarization in IVUS as well as the markers for the partition of the vessel into morphological and clinically interesting events.
Finally, multi-class learning based on ECOC is applied to lung tissue characterization in Computed Tomography. The novel proposed approach, based on supervised and unsupervised learning, achieves accurate tissue classification on a large and heterogeneous dataset.
 
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Petia Radeva;Oriol Pujol  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number Admin @ si @ Cio2012 Serial 2146  
Permanent link to this record
 

 
Author Susana Alvarez edit  openurl
  Title Revisión de la teoría de los Textons Enfoque computacional en color Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract El color y la textura son dos estímulos visuales importantes para la interpretación de las imágenes. La definición de descriptores computacionales que combinan estas dos características es aún un problema abierto. La dificultad se deriva esencialmente de la propia naturaleza de ambas, mientras que la textura es una propiedad de una región, el color es una propiedad de un punto.

Hasta ahora se han utilizado tres los tipos de aproximaciones para la combinación, (a) se describe la textura directamente en cada uno de los canales color, (b) se describen textura y color por separado y se combinan al final, y (c) la combinación se realiza con técnicas de aprendizaje automático. Considerando que este problema se resuelve en el sistema visual humano en niveles muy tempranos, en esta tesis se propone estudiar el problema a partir de la implementación directa de una teoría perceptual, la teoría de los textons, y explorar así su extensión a color.

Puesto que la teoría de los textons se basa en la descripción de la textura a partir de las densidades de los atributos locales, esto se adapta perfectamente al marco de trabajo de los descriptores holísticos (bag-of-words). Se han estudiado diversos descriptores basados en diferentes espacios de textons, y diferentes representaciones de las imágenes. Asimismo se ha estudiado la viabilidad de estos descriptores en una representación conceptual de nivel intermedio.

Los descriptores propuestos han demostrado ser muy eficientes en aplicaciones de recuperación y clasificación de imágenes, presentando ventajas en la generación de vocabularios. Los vocabularios se obtienen cuantificando directamente espacios de baja dimensión y la perceptualidad de estos espacios permite asociar semántica de bajo nivel a las palabras visuales. El estudio de los resultados permite concluir que si bien la aproximación holística es muy eficiente, la introducción de co-ocurrencia espacial de las propiedades de forma y color de los blobs de la imagen es un elemento clave para su combinación, hecho que no contradice las evidencias en percepción
 
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Maria Vanrell;Xavier Otazu  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes CIC Approved no  
  Call Number Alv2012b Serial 2216  
Permanent link to this record
 

 
Author Ariel Amato edit  openurl
  Title Environment-Independent Moving Cast Shadow Suppression in Video Surveillance Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This thesis is devoted to moving shadows detection and suppression. Shadows could be defined as the parts of the scene that are not directly illuminated by a light source due to obstructing object or objects. Often, moving shadows in images sequences are undesirable since they could cause degradation of the expected results during processing of images for object detection, segmentation, scene surveillance or similar purposes. In this thesis first moving shadow detection methods are exhaustively overviewed. Beside the mentioned methods from literature and to compensate their limitations a new moving shadow detection method is proposed. It requires no prior knowledge about the scene, nor is it restricted to assumptions about specific scene structures. Furthermore, the technique can detect both achromatic and chromatic shadows even in the presence of camouflage that occurs when foreground regions are very similar in color to shadowed regions. The method exploits local color constancy properties due to reflectance suppression over shadowed regions. To detect shadowed regions in a scene the values of the background image are divided by values of the current frame in the RGB color space. In the thesis how this luminance ratio can be used to identify segments with low gradient constancy is shown, which in turn distinguish shadows from foreground. Experimental results on a collection of publicly available datasets illustrate the superior performance of the proposed method compared with the most sophisticated state-of-the-art shadow detection algorithms. These results show that the proposed approach is robust and accurate over a broad range of shadow types and challenging video conditions.  
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Mikhail Mozerov;Jordi Gonzalez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Ama2012 Serial 2201  
Permanent link to this record
 

 
Author Noha Elfiky edit  openurl
  Title Compact, Adaptive and Discriminative Spatial Pyramids for Improved Object and Scene Classification Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The release of challenging datasets with a vast number of images, requires the development of efficient image representations and algorithms which are able to manipulate these large-scale datasets efficiently. Nowadays the Bag-of-Words (BoW) is the most successful approach in the context of object and scene classification tasks. However, its main drawback is the absence of the important spatial information. Spatial pyramids (SP) have been successfully applied to incorporate spatial information into BoW-based image representation. Observing the remarkable performance of spatial pyramids, their growing number of applications to a broad range of vision problems, and finally its geometry inclusion, a question can be asked what are the limits of spatial pyramids. Within the SP framework, the optimal way for obtaining an image spatial representation, which is able to cope with it’s most foremost shortcomings, concretely, it’s high dimensionality and the rigidity of the resulting image representation, still remains an active research domain. In summary, the main concern of this thesis is to search for the limits of spatial pyramids and try to figure out solutions for them.  
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Elf2012 Serial 2202  
Permanent link to this record
 

 
Author Marco Pedersoli edit  openurl
  Title Hierarchical Multiresolution Models for fast Object Detection Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The ability to automatically detect and recognize objects in unconstrained images is becoming more and more critical: from security systems and autonomous robots, to smart phones and augmented reality, intelligent devices need to understand the meaning of images as a composition of semantic objects. This Thesis tackles the problem of fast object detection based on template models. Detection consists of searching for an object in an image by evaluating the similarity between a template model and an image region at each possible location and scale. In this work, we show that using a template model representation based on a multiple resolution hierarchy is an optimal choice that can lead to excellent detection accuracy and fast computation. We implement two different approaches that make use of a hierarchy of multiresolution models: a multiresolution cascade and a coarse-to-fine search. Also, we extend the coarse-to-fine search by introducing a deformable part-based model that achieves state-of-the-art results together with a very reduced computational cost. Finally, we specialize our approach to the challenging task of pedestrian detection from moving vehicles and show that the overall quality of the system outperforms previous works in terms of speed and accuracy.  
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Ped2012 Serial 2203  
Permanent link to this record
 

 
Author Jaume Gibert edit  openurl
  Title Vector Space Embedding of Graphs via Statistics of Labelling Information Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Pattern recognition is the task that aims at distinguishing objects among different classes. When such a task wants to be solved in an automatic way a crucial step is how to formally represent such patterns to the computer. Based on the different representational formalisms, we may distinguish between statistical and structural pattern recognition. The former describes objects as a set of measurements arranged in the form of what is called a feature vector. The latter assumes that relations between parts of the underlying objects need to be explicitly represented and thus it uses relational structures such as graphs for encoding their inherent information. Vector spaces are a very flexible mathematical structure that has allowed to come up with several efficient ways for the analysis of patterns under the form of feature vectors. Nevertheless, such a representation cannot explicitly cope with binary relations between parts of the objects and it is restricted to measure the exact same number of features for each pattern under study regardless of their complexity. Graph-based representations present the contrary situation. They can easily adapt to the inherent complexity of the patterns but introduce a problem of high computational complexity, hindering the design of efficient tools to process and analyse patterns.

Solving this paradox is the main goal of this thesis. The ideal situation for solving pattern recognition problems would be to represent the patterns using relational structures such as graphs, and to be able to use the wealthy repository of data processing tools from the statistical pattern recognition domain. An elegant solution to this problem is to transform the graph domain into a vector domain where any processing algorithm can be applied. In other words, by mapping each graph to a point in a vector space we automatically get access to the rich set of algorithms from the statistical domain to be applied in the graph domain. Such methodology is called graph embedding.

In this thesis we propose to associate feature vectors to graphs in a simple and very efficient way by just putting attention on the labelling information that graphs store. In particular, we count frequencies of node labels and of edges between labels. Although their locality, these features are able to robustly represent structurally global properties of graphs, when considered together in the form of a vector. We initially deal with the case of discrete attributed graphs, where features are easy to compute. The continuous case is tackled as a natural generalization of the discrete one, where rather than counting node and edge labelling instances, we count statistics of some representatives of them. We encounter how the proposed vectorial representations of graphs suffer from high dimensionality and correlation among components and we face these problems by feature selection algorithms. We also explore how the diversity of different embedding representations can be exploited in order to boost the performance of base classifiers in a multiple classifier systems framework. An extensive experimental evaluation finally shows how the methodology we propose can be efficiently computed and compete with other graph matching and embedding methodologies.
 
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Ernest Valveny  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ Gib2012 Serial 2204  
Permanent link to this record
 

 
Author Mohammad Rouhani edit  openurl
  Title Shape Representation and Registration using Implicit Functions Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Shape representation and registration are two important problems in computer vision and graphics. Representing the given cloud of points through an implicit function provides a higher level information describing the data. This representation can be more compact more robust to noise and outliers, hence it can be exploited in different computer vision application. In the first part of this thesis implicit shape representations, including both implicit B-spline and polynomial, are tackled. First, an approximation of a geometric distance is proposed to measure the closeness of the given cloud of points and the implicit surface. The analysis of the proposed distance shows an accurate estimation with smooth behavior. The distance by itself is used in a RANSAC based quadratic fitting method. Moreover, since the gradient information of the distance with respect to the surface parameters can be analytically computed, it is used in Levenberg-Marquadt algorithm to refine the surface parameters. In a different approach, an algebraic fitting method is used to represent an object through implicit B-splines. The outcome is a smooth flexible surface and can be represented in different levels from coarse to fine. This property has been exploited to solve the registration problem in the second part of the thesis. In the proposed registration technique the model set is replaced with an implicit representation provided in the first part; then, the point-to-point registration is converted to a point-to-model one in a higher level. This registration error can benefit from different distance estimations to speed up the registration process even without need of correspondence search. Finally, the non-rigid registration problem is tackled through a quadratic distance approximation that is based on the curvature information of the model set. This approximation is used in a free form deformation model to update its control lattice. Then it is shown how an accurate distance approximation can benefit non-rigid registration problems.  
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Angel Sappa  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Rou2012 Serial 2205  
Permanent link to this record
 

 
Author Jose Carlos Rubio edit  openurl
  Title Many-to-Many High Order Matching. Applications to Tracking and Object Segmentation Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Feature matching is a fundamental problem in Computer Vision, having multiple applications such as tracking, image classification and retrieval, shape recognition and stereo fusion. In numerous domains, it is useful to represent the local structure of the matching features to increase the matching accuracy or to make the correspondence invariant to certain transformations (affine, homography, etc. . . ). However, encoding this knowledge requires complicating the model by establishing high-order relationships between the model elements, and therefore increasing the complexity of the optimization problem.

The importance of many-to-many matching is sometimes dismissed in the literature. Most methods are restricted to perform one-to-one matching, and are usually validated on synthetic, or non-realistic datasets. In a real challenging environment, with scale, pose and illumination variations of the object of interest, as well as the presence of occlusions, clutter, and noisy observations, many-to-many matching is necessary to achieve satisfactory results. As a consequence, finding the most likely many-to-many correspondence often involves a challenging combinatorial optimization process.

In this work, we design and demonstrate matching algorithms that compute many-to-many correspondences, applied to several challenging problems. Our goal is to make use of high-order representations to improve the expressive power of the matching, at the same time that we make feasible the process of inference or optimization of such models. We effectively use graphical models as our preferred representation because they provide an elegant probabilistic framework to tackle structured prediction problems.

We introduce a matching-based tracking algorithm which performs matching between frames of a video sequence in order to solve the difficult problem of headlight tracking at night-time. We also generalise this algorithm to solve the problem of data association applied to various tracking scenarios. We demonstrate the effectiveness of such approach in real video sequences and we show that our tracking algorithm can be used to improve the accuracy of a headlight classification system.

In the second part of this work, we move from single (point) matching to dense (region) matching and we introduce a new hierarchical image representation. We make use of such model to develop a high-order many-to-many matching between pairs of images. We show that the use of high-order models in comparison to simpler models improves not only the accuracy of the results, but also the convergence speed of the inference algorithm.

Finally, we keep exploiting the idea of region matching to design a fully unsupervised image co-segmentation algorithm that is able to perform competitively with state-of-the-art supervised methods. Our method also overcomes the typical drawbacks of some of the past works, such as avoiding the necessity of variate appearances on the image backgrounds. The region matching in this case is applied to effectively exploit inter-image information. We also extend this work to perform co-segmentation of videos, being the first time that such problem is addressed, as a way to perform video object segmentation
 
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joan Serrat  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Rub2012 Serial 2206  
Permanent link to this record
 

 
Author Bhaskar Chakraborty edit  openurl
  Title Model free approach to human action recognition Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Automatic understanding of human activity and action is very important and challenging research area of Computer Vision with wide applications in video surveillance, motion analysis, virtual reality interfaces, video indexing, content based video retrieval, HCI and health care. This thesis presents a series of techniques to solve the problem of human action recognition in video. First approach towards this goal is based on a probabilistic optimization model of body parts using Hidden Markov Model. This strong model based approach is able to distinguish between similar actions by only considering the body parts having major contributions to the actions. In next approach, we apply a weak model based human detector and actions are represented by Bag-of-key poses model to capture the human pose changes during the actions. To tackle the problem of human action recognition in complex scenes, a selective spatio-temporal interest point (STIP) detector is proposed by using a mechanism similar to that of the non-classical receptive field inhibition that is exhibited by most oriented selective neuron in the primary visual cortex. An extension of the selective STIP detector is applied to multi-view action recognition system by introducing a novel 4D STIPs (3D space + time). Finally, we use our STIP detector on large scale continuous visual event recognition problem and propose a novel generalized max-margin Hough transformation framework for activity detection  
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Xavier Roca  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Cha2012 Serial 2207  
Permanent link to this record
 

 
Author Josep M. Gonfaus edit  openurl
  Title Towards Deep Image Understanding: From pixels to semantics Type Book Whole
  Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Understanding the content of the images is one of the greatest challenges of computer vision. Recognition of objects appearing in images, identifying and interpreting their actions are the main purposes of Image Understanding. This thesis seeks to identify what is present in a picture by categorizing and locating all the objects in the scene.
Images are composed by pixels, and one possibility consists of assigning to each pixel an object category, which is commonly known as semantic segmentation. By incorporating information as a contextual cue, we are able to resolve the ambiguity within categories at the pixel-level. We propose three levels of scale in order to resolve such ambiguity.
Another possibility to represent the objects is the object detection task. In this case, the aim is to recognize and localize the whole object by accurately placing a bounding box around it. We present two new approaches. The first one is focused on improving the object representation of deformable part models with the concept of factorized appearances. The second approach addresses the issue of reducing the computational cost for multi-class recognition. The results given have been validated on several commonly used datasets, reaching international recognition and state-of-the-art within the field
 
  Address  
  Corporate Author Thesis (down) Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Theo Gevers  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Gon2012 Serial 2208  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: