|   | 
Details
   web
Records
Author Meysam Madadi
Title Human Segmentation, Pose Estimation and Applications Type Book Whole
Year 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Automatic analyzing humans in photographs or videos has great potential applications in computer vision, including medical diagnosis, sports, entertainment, movie editing and surveillance, just to name a few. Body, face and hand are the most studied components of humans. Body has many variabilities in shape and clothing along with high degrees of freedom in pose. Face has many muscles causing many visible deformity, beside variable shape and hair style. Hand is a small object, moving fast and has high degrees of freedom. Adding human characteristics to all aforementioned variabilities makes human analysis quite a challenging task.
In this thesis, we developed human segmentation in different modalities. In a first scenario, we segmented human body and hand in depth images using example-based shape warping. We developed a shape descriptor based on shape context and class probabilities of shape regions to extract nearest neighbors. We then considered rigid affine alignment vs. nonrigid iterative shape warping. In a second scenario, we segmented face in RGB images using convolutional neural networks (CNN). We modeled conditional random field with recurrent neural networks. In our model pair-wise kernels are not fixed and learned during training. We trained the network end-to-end using adversarial networks which improved hair segmentation by a high margin.
We also worked on 3D hand pose estimation in depth images. In a generative approach, we fitted a finger model separately for each finger based on our example-based rigid hand segmentation. We minimized an energy function based on overlapping area, depth discrepancy and finger collisions. We also applied linear models in joint trajectory space to refine occluded joints based on visible joints error and invisible joints trajectory smoothness. In a CNN-based approach, we developed a tree-structure network to train specific features for each finger and fused them for global pose consistency. We also formulated physical and appearance constraints as loss functions.
Finally, we developed a number of applications consisting of human soft biometrics measurement and garment retexturing. We also generated some datasets in this thesis consisting of human segmentation, synthetic hand pose, garment retexturing and Italian gestures.
Address (down) October 2017
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Sergio Escalera;Jordi Gonzalez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-945373-3-2 Medium
Area Expedition Conference
Notes HUPBA Approved no
Call Number Admin @ si @ Mad2017 Serial 3017
Permanent link to this record
 

 
Author Arash Akbarinia
Title Computational Model of Visual Perception: From Colour to Form Type Book Whole
Year 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract The original idea of this project was to study the role of colour in the challenging task of object recognition. We started by extending previous research on colour naming showing that it is feasible to capture colour terms through parsimonious ellipsoids. Although, the results of our model exceeded state-of-the-art in two benchmark datasets, we realised that the two phenomena of metameric lights and colour constancy must be addressed prior to any further colour processing. Our investigation of metameric pairs reached the conclusion that they are infrequent in real world scenarios. Contrary to that, the illumination of a scene often changes dramatically. We addressed this issue by proposing a colour constancy model inspired by the dynamical centre-surround adaptation of neurons in the visual cortex. This was implemented through two overlapping asymmetric Gaussians whose variances and heights are adjusted according to the local contrast of pixels. We complemented this model with a generic contrast-variant pooling mechanism that inversely connect the percentage of pooled signal to the local contrast of a region. The results of our experiments on four benchmark datasets were indeed promising: the proposed model, although simple, outperformed even learning-based approaches in many cases. Encouraged by the success of our contrast-variant surround modulation, we extended this approach to detect boundaries of objects. We proposed an edge detection model based on the first derivative of the Gaussian kernel. We incorporated four types of surround: full, far, iso- and orthogonal-orientation. Furthermore, we accounted for the pooling mechanism at higher cortical areas and the shape feedback sent to lower areas. Our results in three benchmark datasets showed significant improvement over non-learning algorithms.
To summarise, we demonstrated that biologically-inspired models offer promising solutions to computer vision problems, such as, colour naming, colour constancy and edge detection. We believe that the greatest contribution of this Ph.D dissertation is modelling the concept of dynamic surround modulation that shows the significance of contrast-variant surround integration. The models proposed here are grounded on only a portion of what we know about the human visual system. Therefore, it is only natural to complement them accordingly in future works.
Address (down) October 2017
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor C. Alejandro Parraga
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-945373-4-9 Medium
Area Expedition Conference
Notes NEUROBIT Approved no
Call Number Admin @ si @ Akb2017 Serial 3019
Permanent link to this record
 

 
Author Cristhian Aguilera
Title Local feature description in cross-spectral imagery Type Book Whole
Year 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Over the last few years, the number of consumer computer vision applications has increased dramatically. Today, computer vision solutions can be found in video game consoles, smartphone applications, driving assistance – just to name a few. Ideally, we require the performance of those applications, particularly those that are safety critical to remain constant under any external environment factors, such as changes in illumination or weather conditions. However, this is not always possible or very difficult to obtain by only using visible imagery, due to the inherent limitations of the images from that spectral band. For that reason, the use of images from different or multiple spectral bands is becoming more appealing.
The aforementioned possible advantages of using images from multiples spectral bands on various vision applications make multi-spectral image processing a relevant topic for research and development. Like in visible image processing, multi-spectral image processing needs tools and algorithms to handle information from various spectral bands. Furthermore, traditional tools such as local feature detection, which is the basis of many vision tasks such as visual odometry, image registration, or structure from motion, must be adjusted or reformulated to operate under new conditions. Traditional feature detection, description, and matching methods tend to underperform in multi-spectral settings, in comparison to mono-spectral settings, due to the natural differences between each spectral band.
The work in this thesis is focused on the local feature description problem when cross-spectral images are considered. In this context, this dissertation has three main contributions. Firstly, the work starts by proposing the usage of a combination of frequency and spatial information, in a multi-scale scheme, as feature description. Evaluations of this proposal, based on classical hand-made feature descriptors, and comparisons with state of the art cross-spectral approaches help to find and understand limitations of such strategy. Secondly, different convolutional neural network (CNN) based architectures are evaluated when used to describe cross-spectral image patches. Results showed that CNN-based methods, designed to work with visible monocular images, could be successfully applied to the description of images from two different spectral bands, with just minor modifications. In this framework, a novel CNN-based network model, specifically intended to describe image patches from two different spectral bands, is proposed. This network, referred to as Q-Net, outperforms state of the art in the cross-spectral domain, including both previous hand-made solutions as well as L2 CNN-based architectures. The third contribution of this dissertation is in the cross-spectral feature description application domain. The multispectral odometry problem is tackled showing a real application of cross-spectral descriptors
In addition to the three main contributions mentioned above, in this dissertation, two different multi-spectral datasets are generated and shared with the community to be used as benchmarks for further studies.
Address (down) October 2017
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Angel Sappa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-945373-6-3 Medium
Area Expedition Conference
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ Agu2017 Serial 3020
Permanent link to this record
 

 
Author Adriana Romero
Title Assisting the training of deep neural networks with applications to computer vision Type Book Whole
Year 2015 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Deep learning has recently been enjoying an increasing popularity due to its success in solving challenging tasks. In particular, deep learning has proven to be effective in a large variety of computer vision tasks, such as image classification, object recognition and image parsing. Contrary to previous research, which required engineered feature representations, designed by experts, in order to succeed, deep learning attempts to learn representation hierarchies automatically from data. More recently, the trend has been to go deeper with representation hierarchies.
Learning (very) deep representation hierarchies is a challenging task, which
involves the optimization of highly non-convex functions. Therefore, the search
for algorithms to ease the learning of (very) deep representation hierarchies from data is extensive and ongoing.
In this thesis, we tackle the challenging problem of easing the learning of (very) deep representation hierarchies. We present a hyper-parameter free, off-the-shelf, simple and fast unsupervised algorithm to discover hidden structure from the input data by enforcing a very strong form of sparsity. We study the applicability and potential of the algorithm to learn representations of varying depth in a handful of applications and domains, highlighting the ability of the algorithm to provide discriminative feature representations that are able to achieve top performance.
Yet, while emphasizing the great value of unsupervised learning methods when
labeled data is scarce, the recent industrial success of deep learning has revolved around supervised learning. Supervised learning is currently the focus of many recent research advances, which have shown to excel at many computer vision tasks. Top performing systems often involve very large and deep models, which are not well suited for applications with time or memory limitations. More in line with the current trends, we engage in making top performing models more efficient, by designing very deep and thin models. Since training such very deep models still appears to be a challenging task, we introduce a novel algorithm that guides the training of very thin and deep models by hinting their intermediate representations.
Very deep and thin models trained by the proposed algorithm end up extracting feature representations that are comparable or even better performing
than the ones extracted by large state-of-the-art models, while compellingly
reducing the time and memory consumption of the model.
Address (down) October 2015
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Carlo Gatta;Petia Radeva
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ Rom2015 Serial 2707
Permanent link to this record
 

 
Author Carlo Gatta; Oriol Pujol; Oriol Rodriguez-Leor; J. Mauri; Petia Radeva
Title Robust Image-based IVUS Pullbacks Gating Type Book Chapter
Year 2008 Publication Proceedings 11th International ConferenceMedical Image Computing and Computer–Assisted Intervention Abbreviated Journal
Volume 5242 Issue Pages 518–525
Keywords
Abstract
Address (down) NY (USA)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MICCAI
Notes MILAB;HuPBA Approved no
Call Number BCNPCL @ bcnpcl @ GPR2008a Serial 1037
Permanent link to this record
 

 
Author Carlo Gatta; Oriol Pujol; Oriol Rodriguez-Leor; J. Mauri; Petia Radeva
Title Improved Rigid Registration of Vessel Structures using the Fast Radial Symmetry Transform Type Conference Article
Year 2008 Publication Computer Vision for Intravascular Imaging CVII’08 Workshop Medical Image Computing and Computer–Assisted Intervention , 11th International Conference Abbreviated Journal
Volume Issue Pages 128–136
Keywords
Abstract
Address (down) NY (USA)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference MICCAI
Notes MILAB;HUPBA Approved no
Call Number BCNPCL @ bcnpcl @ GPR2008b Serial 1038
Permanent link to this record
 

 
Author Bogdan Raducanu; Fadi Dornaika
Title Dynamic Vs. Static Recognition of Facial Expressions Type Book Chapter
Year 2008 Publication Ambient Intelligence. European Conference Abbreviated Journal
Volume 5355 Issue Pages 13–25
Keywords
Abstract
Address (down) Nuremberg (Germany)
Corporate Author Thesis
Publisher Place of Publication Editor Rabuñal
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference AMI
Notes OR; MV Approved no
Call Number BCNPCL @ bcnpcl @ RaD2008 Serial 1035
Permanent link to this record
 

 
Author Lichao Zhang
Title Towards end-to-end Networks for Visual Tracking in RGB and TIR Videos Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In the current work, we identify several problems of current tracking systems. The lack of large-scale labeled datasets hampers the usage of deep learning, especially end-to-end training, for tracking in TIR images. Therefore, many methods for tracking on TIR data are still based on hand-crafted features. This situation also happens in multi-modal tracking, e.g. RGB-T tracking. Another reason, which hampers the development of RGB-T tracking, is that there exists little research on the fusion mechanisms for combining information from RGB and TIR modalities. One of the crucial components of most trackers is the update module. For the currently existing end-to-end tracking architecture, e.g, Siamese trackers, the online model update is still not taken into consideration at the training stage. They use no-update or a linear update strategy during the inference stage. While such a hand-crafted approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update.

To address the data-scarcity for TIR and RGB-T tracking, we use image-to-image translation to generate a large-scale synthetic TIR dataset. This dataset allows us to perform end-to-end training for TIR tracking. Furthermore, we investigate several fusion mechanisms for RGB-T tracking. The multi-modal trackers are also trained in an end-to-end manner on the synthetic data. To improve the standard online update, we pose the updating step as an optimization problem which can be solved by training a neural network. Our approach thereby reduces the hand-crafted components in the tracking pipeline and sets a further step in the direction of a complete end-to-end trained tracking network which also considers updating during optimization.
Address (down) November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Abel Gonzalez;Fahad Shahbaz Khan
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-1210011-1-9 Medium
Area Expedition Conference
Notes LAMP; 600.141; 600.120 Approved no
Call Number Admin @ si @ Zha2019 Serial 3393
Permanent link to this record
 

 
Author Lu Yu
Title Semantic Representation: From Color to Deep Embeddings Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings.
In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists.
In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
Address (down) November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Yongmei Cheng
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-3-3 Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ Yu2019 Serial 3394
Permanent link to this record
 

 
Author Albert Berenguel
Title Analysis of background textures in banknotes and identity documents for counterfeit detection Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.
Address (down) November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Oriol Ramos Terrades;Josep Llados
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-2-6 Medium
Area Expedition Conference
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ Ber2019 Serial 3395
Permanent link to this record
 

 
Author Dena Bazazian
Title Fully Convolutional Networks for Text Understanding in Scene Images Type Book Whole
Year 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Text understanding in scene images has gained plenty of attention in the computer vision community and it is an important task in many applications as text carries semantically rich information about scene content and context. For instance, reading text in a scene can be applied to autonomous driving, scene understanding or assisting visually impaired people. The general aim of scene text understanding is to localize and recognize text in scene images. Text regions are first localized in the original image by a trained detector model and afterwards fed into a recognition module. The tasks of localization and recognition are highly correlated since an inaccurate localization can affect the recognition task.
The main purpose of this thesis is to devise efficient methods for scene text understanding. We investigate how the latest results on deep learning can advance text understanding pipelines. Recently, Fully Convolutional Networks (FCNs) and derived methods have achieved a significant performance on semantic segmentation and pixel level classification tasks. Therefore, we took benefit of the strengths of FCN approaches in order to detect text in natural scenes. In this thesis we have focused on two challenging tasks of scene text understanding which are Text Detection and Word Spotting. For the task of text detection, we have proposed an efficient text proposal technique in scene images. We have considered the Text Proposals method as the baseline which is an approach to reduce the search space of possible text regions in an image. In order to improve the Text Proposals method we combined it with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same level of accuracy and thus gaining a significant speed up. Our experiments demonstrate that this text proposal approach yields significantly higher recall rates than the line based text localization techniques, while also producing better-quality localization. We have also applied this technique on compressed images such as videos from wearable egocentric cameras. For the task of word spotting, we have introduced a novel mid-level word representation method. We have proposed a technique to create and exploit an intermediate representation of images based on text attributes which roughly correspond to character probability maps. Our representation extends the concept of Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks through an efficient text line proposal algorithm. To evaluate the detected text, we propose a novel line based evaluation along with the classic bounding box based approach. We test our method on incidental scene text images which comprises real-life scenarios such as urban scenes. The importance of incidental scene text images is due to the complexity of backgrounds, perspective, variety of script and language, short text and little linguistic context. All of these factors together makes the incidental scene text images challenging.
Address (down) November 2018
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Dimosthenis Karatzas;Andrew Bagdanov
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-1-1 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Baz2018 Serial 3220
Permanent link to this record
 

 
Author Suman Ghosh
Title Word Spotting and Recognition in Images from Heterogeneous Sources A Type Book Whole
Year 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations.
Address (down) November 2018
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Ernest Valveny
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-0-4 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Gho2018 Serial 3217
Permanent link to this record
 

 
Author Ivet Rafegas
Title Color in Visual Recognition: from flat to deep representations and some biological parallelisms Type Book Whole
Year 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Visual recognition is one of the main problems in computer vision that attempts to solve image understanding by deciding what objects are in images. This problem can be computationally solved by using relevant sets of visual features, such as edges, corners, color or more complex object parts. This thesis contributes to how color features have to be represented for recognition tasks.

Image features can be extracted following two different approaches. A first approach is defining handcrafted descriptors of images which is then followed by a learning scheme to classify the content (named flat schemes in Kruger et al. (2013). In this approach, perceptual considerations are habitually used to define efficient color features. Here we propose a new flat color descriptor based on the extension of color channels to boost the representation of spatio-chromatic contrast that surpasses state-of-the-art approaches. However, flat schemes present a lack of generality far away from the capabilities of biological systems. A second approach proposes evolving these flat schemes into a hierarchical process, like in the visual cortex. This includes an automatic process to learn optimal features. These deep schemes, and more specifically Convolutional Neural Networks (CNNs), have shown an impressive performance to solve various vision problems. However, there is a lack of understanding about the internal representation obtained, as a result of automatic learning. In this thesis we propose a new methodology to explore the internal representation of trained CNNs by defining the Neuron Feature as a visualization of the intrinsic features encoded in each individual neuron. Additionally, and inspired by physiological techniques, we propose to compute different neuron selectivity indexes (e.g., color, class, orientation or symmetry, amongst others) to label and classify the full CNN neuron population to understand learned representations.

Finally, using the proposed methodology, we show an in-depth study on how color is represented on a specific CNN, trained for object recognition, that competes with primate representational abilities (Cadieu et al (2014)). We found several parallelisms with biological visual systems: (a) a significant number of color selectivity neurons throughout all the layers; (b) an opponent and low frequency representation of color oriented edges and a higher sampling of frequency selectivity in brightness than in color in 1st layer like in V1; (c) a higher sampling of color hue in the second layer aligned to observed hue maps in V2; (d) a strong color and shape entanglement in all layers from basic features in shallower layers (V1 and V2) to object and background shapes in deeper layers (V4 and IT); and (e) a strong correlation between neuron color selectivities and color dataset bias.
Address (down) November 2017
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Maria Vanrell
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-945373-7-0 Medium
Area Expedition Conference
Notes CIC Approved no
Call Number Admin @ si @ Raf2017 Serial 3100
Permanent link to this record
 

 
Author Alejandro Gonzalez Alzate
Title Multi-modal Pedestrian Detection Type Book Whole
Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Pedestrian detection continues to be an extremely challenging problem in real scenarios, in which situations like illumination changes, noisy images, unexpected objects, uncontrolled scenarios and variant appearance of objects occur constantly. All these problems force the development of more robust detectors for relevant applications like vision-based autonomous vehicles, intelligent surveillance, and pedestrian tracking for behavior analysis. Most reliable vision-based pedestrian detectors base their decision on features extracted using a single sensor capturing complementary features, e.g., appearance, and texture. These features usually are extracted from the current frame, ignoring temporal information, or including it in a post process step e.g., tracking or temporal coherence. Taking into account these issues we formulate the following question: can we generate more robust pedestrian detectors by introducing new information sources in the feature extraction step?
In order to answer this question we develop different approaches for introducing new information sources to well-known pedestrian detectors. We start by the inclusion of temporal information following the Stacked Sequential Learning (SSL) paradigm which suggests that information extracted from the neighboring samples in a sequence can improve the accuracy of a base classifier.
We then focus on the inclusion of complementary information from different sensors like 3D point clouds (LIDAR – depth), far infrared images (FIR), or disparity maps (stereo pair cameras). For this end we develop a multi-modal framework in which information from different sensors is used for increasing detection accuracy (by increasing information redundancy). Finally we propose a multi-view pedestrian detector, this multi-view approach splits the detection problem in n sub-problems.
Each sub-problem will detect objects in a given specific view reducing in that way the variability problem faced when a single detectors is used for the whole problem. We show that these approaches obtain competitive results with other state-of-the-art methods but instead of design new features, we reuse existing ones boosting their performance.
Address (down) November 2015
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor David Vazquez;Antonio Lopez;
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-943427-7-6 Medium
Area Expedition Conference
Notes ADAS; 600.076 Approved no
Call Number Admin @ si @ Gon2015 Serial 2706
Permanent link to this record
 

 
Author Sergio Vera
Title Anatomic Registration based on Medial Axis Parametrizations Type Book Whole
Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Image registration has been for many years the gold standard method to bring two images into correspondence. It has been used extensively in the eld of medical imaging in order to put images of di erent patients into a common overlapping spatial position. However, medical image registration is a slow, iterative optimization process, where many variables and prone to fall into the pit traps local minima.
A coordinate system parameterizing the interior of organs is a powerful tool for a systematic localization of injured tissue. If the same coordinate values are assigned to speci c anatomical sites, parameterizations ensure integration of data across different medical image modalities. Harmonic mappings have been used to produce parametric meshes over the surface of anatomical shapes, given their ability to set values at speci c locations through boundary conditions. However, most of the existing implementations in medical imaging restrict to either anatomical surfaces, or the depth coordinate with boundary conditions is given at discrete sites of limited geometric diversity.
The medial surface of the shape can be used to provide a continuous basis for the de nition of a depth coordinate. However, given that di erent methods for generation of medial surfaces generate di erent manifolds, not all of them are equally suited to be the basis of radial coordinate for a parameterization. It would be desirable that the medial surface will be smooth, and robust to surface shape noise, with low number of spurious branches or surfaces.
In this thesis we present methods for computation of smooth medial manifolds and apply them to the generation of for anatomical volumetric parameterization that extends current harmonic parameterizations to the interior anatomy using information provided by the volume medial surface. This reference system sets a solid base for creating anatomical models of the anatomical shapes, and allows comparing several patients in a common framework of reference.
Address (down) November 2015
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Debora Gil;Miguel Angel Gonzalez Ballester
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-943427-8-3 Medium
Area Expedition Conference
Notes IAM; 600.075 Approved no
Call Number Admin @ si @ Ver2015 Serial 2708
Permanent link to this record