toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Albert Berenguel edit  isbn
openurl 
  Title Analysis of background textures in banknotes and identity documents for counterfeit detection Type Book Whole
  Year (down) 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.  
  Address November 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Oriol Ramos Terrades;Josep Llados  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-121011-2-6 Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ Ber2019 Serial 3395  
Permanent link to this record
 

 
Author Xialei Liu edit  isbn
openurl 
  Title Visual recognition in the wild: learning from rankings in small domains and continual learning in new domains Type Book Whole
  Year (down) 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition application, such as image classification, detection and segmentation. In this thesis we address two limitations of CNNs. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Another limitation is that training CNNs in a continual learning setting is still an open research question. Catastrophic forgetting is very likely when adapting trained models to new environments or new tasks. Therefore, in this thesis, we aim to improve CNNs for applications with limited data and to adapt CNNs continually to new tasks.
Self-supervised learning leverages unlabelled data by introducing an auxiliary task for which data is abundantly available. In the first part of the thesis, we show how rankings can be used as a proxy self-supervised task for regression problems. Then we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning. We then apply our framework on two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both, we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results. We further show that active learning using rankings can reduce labeling effort by up to 50\% for both IQA and crowd counting.
In the second part of the thesis, we propose two approaches to avoiding catastrophic forgetting in sequential task learning scenarios. The first approach is derived from Elastic Weight Consolidation, which uses a diagonal Fisher Information Matrix (FIM) to measure the importance of the parameters of the network. However the diagonal assumption is unrealistic. Therefore, we approximately diagonalize the FIM using a set of factorized rotation parameters. This leads to significantly better performance on continual learning of sequential tasks. For the second approach, we show that forgetting manifests differently at different layers in the network and propose a hybrid approach where distillation is used in the feature extractor and replay in the classifier via feature generation. Our method addresses the limitations of generative image replay and probability distillation (i.e. learning without forgetting) and can naturally aggregate new tasks in a single, well-calibrated classifier. Experiments confirm that our proposed approach outperforms the baselines and some start-of-the-art methods.
 
  Address December 2019  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Andrew Bagdanov  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-121011-4-0 Medium  
  Area Expedition Conference  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ Liu2019 Serial 3396  
Permanent link to this record
 

 
Author Sergio Escalera; Stephane Ayache; Jun Wan; Meysam Madadi; Umut Guçlu; Xavier Baro edit  url
doi  openurl
  Title Inpainting and Denoising Challenges Type Book Whole
  Year (down) 2019 Publication The Springer Series on Challenges in Machine Learning Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The problem of dealing with missing or incomplete data in machine learning and computer vision arises in many applications. Recent strategies make use of generative models to impute missing or corrupted data. Advances in computer vision using deep generative models have found applications in image/video processing, such as denoising, restoration, super-resolution, or inpainting.
Inpainting and Denoising Challenges comprises recent efforts dealing with image and video inpainting tasks. This includes winning solutions to the ChaLearn Looking at People inpainting and denoising challenges: human pose recovery, video de-captioning and fingerprint restoration.
This volume starts with a wide review on image denoising, retracing and comparing various methods from the pioneer signal processing methods, to machine learning approaches with sparse and low-rank models, and recent deep learning architectures with autoencoders and variants. The following chapters present results from the Challenge, including three competition tasks at WCCI and ECML 2018. The top best approaches submitted by participants are described, showing interesting contributions and innovating methods. The last two chapters propose novel contributions and highlight new applications that benefit from image/video inpainting.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ EAW2019 Serial 3398  
Permanent link to this record
 

 
Author Fernando Vilariño edit  openurl
  Title 3D Scanning of Capitals at Library Living Lab Type Book Whole
  Year (down) 2019 Publication “Living Lab Projects 2019”. ENoLL. Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MV; DAG; 600.140; 600.121;SIAI Approved no  
  Call Number Admin @ si @ Vil2019c Serial 3463  
Permanent link to this record
 

 
Author Aymen Azaza edit  isbn
openurl 
  Title Context, Motion and Semantic Information for Computational Saliency Type Book Whole
  Year (down) 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The main objective of this thesis is to highlight the salient object in an image or in a video sequence. We address three important—but in our opinion
insufficiently investigated—aspects of saliency detection. Firstly, we start
by extending previous research on saliency which explicitly models the information provided from the context. Then, we show the importance of
explicit context modelling for saliency estimation. Several important works
in saliency are based on the usage of object proposals. However, these methods
focus on the saliency of the object proposal itself and ignore the context.
To introduce context in such saliency approaches, we couple every object
proposal with its direct context. This allows us to evaluate the importance
of the immediate surround (context) for its saliency. We propose several
saliency features which are computed from the context proposals including
features based on omni-directional and horizontal context continuity. Secondly,
we investigate the usage of top-downmethods (high-level semantic
information) for the task of saliency prediction since most computational
methods are bottom-up or only include few semantic classes. We propose
to consider a wider group of object classes. These objects represent important
semantic information which we will exploit in our saliency prediction
approach. Thirdly, we develop a method to detect video saliency by computing
saliency from supervoxels and optical flow. In addition, we apply the
context features developed in this thesis for video saliency detection. The
method combines shape and motion features with our proposed context
features. To summarize, we prove that extending object proposals with their
direct context improves the task of saliency detection in both image and
video data. Also the importance of the semantic information in saliency
estimation is evaluated. Finally, we propose a newmotion feature to detect
saliency in video data. The three proposed novelties are evaluated on standard
saliency benchmark datasets and are shown to improve with respect to
state-of-the-art.
 
  Address October 2018  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Ali Douik  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-945373-9-4 Medium  
  Area Expedition Conference  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ Aza2018 Serial 3218  
Permanent link to this record
 

 
Author Dena Bazazian edit  isbn
openurl 
  Title Fully Convolutional Networks for Text Understanding in Scene Images Type Book Whole
  Year (down) 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Text understanding in scene images has gained plenty of attention in the computer vision community and it is an important task in many applications as text carries semantically rich information about scene content and context. For instance, reading text in a scene can be applied to autonomous driving, scene understanding or assisting visually impaired people. The general aim of scene text understanding is to localize and recognize text in scene images. Text regions are first localized in the original image by a trained detector model and afterwards fed into a recognition module. The tasks of localization and recognition are highly correlated since an inaccurate localization can affect the recognition task.
The main purpose of this thesis is to devise efficient methods for scene text understanding. We investigate how the latest results on deep learning can advance text understanding pipelines. Recently, Fully Convolutional Networks (FCNs) and derived methods have achieved a significant performance on semantic segmentation and pixel level classification tasks. Therefore, we took benefit of the strengths of FCN approaches in order to detect text in natural scenes. In this thesis we have focused on two challenging tasks of scene text understanding which are Text Detection and Word Spotting. For the task of text detection, we have proposed an efficient text proposal technique in scene images. We have considered the Text Proposals method as the baseline which is an approach to reduce the search space of possible text regions in an image. In order to improve the Text Proposals method we combined it with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same level of accuracy and thus gaining a significant speed up. Our experiments demonstrate that this text proposal approach yields significantly higher recall rates than the line based text localization techniques, while also producing better-quality localization. We have also applied this technique on compressed images such as videos from wearable egocentric cameras. For the task of word spotting, we have introduced a novel mid-level word representation method. We have proposed a technique to create and exploit an intermediate representation of images based on text attributes which roughly correspond to character probability maps. Our representation extends the concept of Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks through an efficient text line proposal algorithm. To evaluate the detected text, we propose a novel line based evaluation along with the classic bounding box based approach. We test our method on incidental scene text images which comprises real-life scenarios such as urban scenes. The importance of incidental scene text images is due to the complexity of backgrounds, perspective, variety of script and language, short text and little linguistic context. All of these factors together makes the incidental scene text images challenging.
 
  Address November 2018  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Dimosthenis Karatzas;Andrew Bagdanov  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-948531-1-1 Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ Baz2018 Serial 3220  
Permanent link to this record
 

 
Author Cesar de Souza edit  openurl
  Title Action Recognition in Videos: Data-efficient approaches for supervised learning of human action classification models for video Type Book Whole
  Year (down) 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In this dissertation, we explore different ways to perform human action recognition in video clips. We focus on data efficiency, proposing new approaches that alleviate the need for laborious and time-consuming manual data annotation. In the first part of this dissertation, we start by analyzing previous state-of-the-art models, comparing their differences and similarities in order to pinpoint where their real strengths come from. Leveraging this information, we then proceed to boost the classification accuracy of shallow models to levels that rival deep neural networks. We introduce hybrid video classification architectures based on carefully designed unsupervised representations of handcrafted spatiotemporal features classified by supervised deep networks. We show in our experiments that our hybrid model combine the best of both worlds: it is data efficient (trained on 150 to 10,000 short clips) and yet improved significantly on the state of the art, including deep models trained on millions of manually labeled images and videos. In the second part of this research, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for “Procedural Human Action Videos”. It contains a total of 39,982 videos, with more than 1,000 examples for each action of 35 categories. Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions. We then introduce deep multi-task representation learning architectures to mix synthetic and real videos, even if the action categories differ. Our experiments on the UCF-101 and HMDB-51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance, outperforming fine-tuning state-of-the-art unsupervised generative models of videos.  
  Address April 2018  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Antonio Lopez;Naila Murray  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.118 Approved no  
  Call Number Admin @ si @ Sou2018 Serial 3127  
Permanent link to this record
 

 
Author Alicia Fornes; Bart Lamiroy edit  url
isbn  openurl
  Title Graphics Recognition, Current Trends and Evolutions Type Book Whole
  Year (down) 2018 Publication Graphics Recognition, Current Trends and Evolutions Abbreviated Journal  
  Volume 11009 Issue Pages  
  Keywords  
  Abstract This book constitutes the thoroughly refereed post-conference proceedings of the 12th International Workshop on Graphics Recognition, GREC 2017, held in Kyoto, Japan, in November 2017.
The 10 revised full papers presented were carefully reviewed and selected from 14 initial submissions. They contain both classical and emerging topics of graphics rcognition, namely analysis and detection of diagrams, search and classification, optical music recognition, interpretation of engineering drawings and maps.
 
  Address  
  Corporate Author Thesis  
  Publisher Springer International Publishing Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-030-02283-9 Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ FoL2018 Serial 3171  
Permanent link to this record
 

 
Author Suman Ghosh edit  isbn
openurl 
  Title Word Spotting and Recognition in Images from Heterogeneous Sources A Type Book Whole
  Year (down) 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations.  
  Address November 2018  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Ernest Valveny  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-948531-0-4 Medium  
  Area Expedition Conference  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ Gho2018 Serial 3217  
Permanent link to this record
 

 
Author Gholamreza Anbarjafari; Sergio Escalera edit  url
isbn  openurl
  Title Human-Robot Interaction: Theory and Application Type Book Whole
  Year (down) 2018 Publication Human-Robot Interaction: Theory and Application Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-78923-316-2 Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ AnE2018 Serial 3216  
Permanent link to this record
 

 
Author Hugo Jair Escalante; Sergio Escalera; Isabelle Guyon; Xavier Baro; Yagmur Gucluturk; Umut Guçlu; Marcel van Gerven edit  url
doi  openurl
  Title Explainable and Interpretable Models in Computer Vision and Machine Learning Type Book Whole
  Year (down) 2018 Publication The Springer Series on Challenges in Machine Learning Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This book compiles leading research on the development of explainable and interpretable machine learning methods in the context of computer vision and machine learning.
Research progress in computer vision and pattern recognition has led to a variety of modeling techniques with almost human-like performance. Although these models have obtained astounding results, they are limited in their explainability and interpretability: what is the rationale behind the decision made? what in the model structure explains its functioning? Hence, while good performance is a critical required characteristic for learning machines, explainability and interpretability capabilities are needed to take learning machines to the next step to include them in decision support systems involving human supervision.
This book, written by leading international researchers, addresses key topics of explainability and interpretability, including the following:

·Evaluation and Generalization in Interpretable Machine Learning
·Explanation Methods in Deep Learning
·Learning Functional Causal Models with Generative Neural Networks
·Learning Interpreatable Rules for Multi-Label Classification
·Structuring Neural Networks for More Explainable Predictions
·Generating Post Hoc Rationales of Deep Visual Classification Decisions
·Ensembling Visual Explanations
·Explainable Deep Driving by Visualizing Causal Attention
·Interdisciplinary Perspective on Algorithmic Job Candidate Search
·Multimodal Personality Trait Analysis for Explainable Modeling of Job Interview Decisions
·Inherent Explainability Pattern Theory-based Video Event Interpretations
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no menciona Approved no  
  Call Number Admin @ si @ EEG2018 Serial 3399  
Permanent link to this record
 

 
Author Antonio Lopez; Atsushi Imiya; Tomas Pajdla; Jose Manuel Alvarez edit  isbn
openurl 
  Title Computer Vision in Vehicle Technology: Land, Sea & Air Type Book Whole
  Year (down) 2017 Publication Abbreviated Journal  
  Volume Issue Pages 161-163  
  Keywords  
  Abstract Summary This chapter examines different vision-based commercial solutions for real-live problems related to vehicles. It is worth mentioning the recent astonishing performance of deep convolutional neural networks (DCNNs) in difficult visual tasks such as image classification, object recognition/localization/detection, and semantic segmentation. In fact,
different DCNN architectures are already being explored for low-level tasks such as optical flow and disparity computation, and higher level ones such as place recognition.
 
  Address  
  Corporate Author Thesis  
  Publisher John Wiley & Sons, Ltd Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-118-86807-2 Medium  
  Area Expedition Conference  
  Notes ADAS; 600.118 Approved no  
  Call Number Admin @ si @ LIP2017a Serial 2937  
Permanent link to this record
 

 
Author Meysam Madadi edit  isbn
openurl 
  Title Human Segmentation, Pose Estimation and Applications Type Book Whole
  Year (down) 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Automatic analyzing humans in photographs or videos has great potential applications in computer vision, including medical diagnosis, sports, entertainment, movie editing and surveillance, just to name a few. Body, face and hand are the most studied components of humans. Body has many variabilities in shape and clothing along with high degrees of freedom in pose. Face has many muscles causing many visible deformity, beside variable shape and hair style. Hand is a small object, moving fast and has high degrees of freedom. Adding human characteristics to all aforementioned variabilities makes human analysis quite a challenging task.
In this thesis, we developed human segmentation in different modalities. In a first scenario, we segmented human body and hand in depth images using example-based shape warping. We developed a shape descriptor based on shape context and class probabilities of shape regions to extract nearest neighbors. We then considered rigid affine alignment vs. nonrigid iterative shape warping. In a second scenario, we segmented face in RGB images using convolutional neural networks (CNN). We modeled conditional random field with recurrent neural networks. In our model pair-wise kernels are not fixed and learned during training. We trained the network end-to-end using adversarial networks which improved hair segmentation by a high margin.
We also worked on 3D hand pose estimation in depth images. In a generative approach, we fitted a finger model separately for each finger based on our example-based rigid hand segmentation. We minimized an energy function based on overlapping area, depth discrepancy and finger collisions. We also applied linear models in joint trajectory space to refine occluded joints based on visible joints error and invisible joints trajectory smoothness. In a CNN-based approach, we developed a tree-structure network to train specific features for each finger and fused them for global pose consistency. We also formulated physical and appearance constraints as loss functions.
Finally, we developed a number of applications consisting of human soft biometrics measurement and garment retexturing. We also generated some datasets in this thesis consisting of human segmentation, synthetic hand pose, garment retexturing and Italian gestures.
 
  Address October 2017  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Sergio Escalera;Jordi Gonzalez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-945373-3-2 Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ Mad2017 Serial 3017  
Permanent link to this record
 

 
Author Onur Ferhat edit  isbn
openurl 
  Title Analysis of Head-Pose Invariant, Natural Light Gaze Estimation Methods Type Book Whole
  Year (down) 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Eye tracker devices have traditionally been only used inside laboratories, requiring trained professionals and elaborate setup mechanisms. However, in the recent years the scientific work on easier–to–use eye trackers which require no special hardware—other than the omnipresent front facing cameras in computers, tablets, and mobiles—is aiming at making this technology common–place. These types of trackers have several extra challenges that make the problem harder, such as low resolution images provided by a regular webcam, the changing ambient lighting conditions, personal appearance differences, changes in head pose, and so on. Recent research in the field has focused on all these challenges in order to provide better gaze estimation performances in a real world setup.

In this work, we aim at tackling the gaze tracking problem in a single camera setup. We first analyze all the previous work in the field, identifying the strengths and weaknesses of each tried idea. We start our work on the gaze tracker with an appearance–based gaze estimation method, which is the simplest idea that creates a direct mapping between a rectangular image patch extracted around the eye in a camera image, and the gaze point (or gaze direction). Here, we do an extensive analysis of the factors that affect the performance of this tracker in several experimental setups, in order to address these problems in future works. In the second part of our work, we propose a feature–based gaze estimation method, which encodes the eye region image into a compact representation. We argue that this type of representation is better suited to dealing with head pose and lighting condition changes, as it both reduces the dimensionality of the input (i.e. eye image) and breaks the direct connection between image pixel intensities and the gaze estimation. Lastly, we use a face alignment algorithm to have robust face pose estimation, using a 3D model customized to the subject using the tracker. We combine this with a convolutional neural network trained on a large dataset of images to build a face pose invariant gaze tracker.
 
  Address September 2017  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Fernando Vilariño  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-945373-5-6 Medium  
  Area Expedition Conference  
  Notes MV Approved no  
  Call Number Admin @ si @ Fer2017 Serial 3018  
Permanent link to this record
 

 
Author Arash Akbarinia edit  isbn
openurl 
  Title Computational Model of Visual Perception: From Colour to Form Type Book Whole
  Year (down) 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The original idea of this project was to study the role of colour in the challenging task of object recognition. We started by extending previous research on colour naming showing that it is feasible to capture colour terms through parsimonious ellipsoids. Although, the results of our model exceeded state-of-the-art in two benchmark datasets, we realised that the two phenomena of metameric lights and colour constancy must be addressed prior to any further colour processing. Our investigation of metameric pairs reached the conclusion that they are infrequent in real world scenarios. Contrary to that, the illumination of a scene often changes dramatically. We addressed this issue by proposing a colour constancy model inspired by the dynamical centre-surround adaptation of neurons in the visual cortex. This was implemented through two overlapping asymmetric Gaussians whose variances and heights are adjusted according to the local contrast of pixels. We complemented this model with a generic contrast-variant pooling mechanism that inversely connect the percentage of pooled signal to the local contrast of a region. The results of our experiments on four benchmark datasets were indeed promising: the proposed model, although simple, outperformed even learning-based approaches in many cases. Encouraged by the success of our contrast-variant surround modulation, we extended this approach to detect boundaries of objects. We proposed an edge detection model based on the first derivative of the Gaussian kernel. We incorporated four types of surround: full, far, iso- and orthogonal-orientation. Furthermore, we accounted for the pooling mechanism at higher cortical areas and the shape feedback sent to lower areas. Our results in three benchmark datasets showed significant improvement over non-learning algorithms.
To summarise, we demonstrated that biologically-inspired models offer promising solutions to computer vision problems, such as, colour naming, colour constancy and edge detection. We believe that the greatest contribution of this Ph.D dissertation is modelling the concept of dynamic surround modulation that shows the significance of contrast-variant surround integration. The models proposed here are grounded on only a portion of what we know about the human visual system. Therefore, it is only natural to complement them accordingly in future works.
 
  Address October 2017  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor C. Alejandro Parraga  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-945373-4-9 Medium  
  Area Expedition Conference  
  Notes NEUROBIT Approved no  
  Call Number Admin @ si @ Akb2017 Serial 3019  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: