|   | 
Details
   web
Records
Author Albert Berenguel
Title Analysis of background textures in banknotes and identity documents for counterfeit detection Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.
Address November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Oriol Ramos Terrades;Josep Llados
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-2-6 Medium
Area Expedition Conference
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ Ber2019 Serial 3395
Permanent link to this record
 

 
Author Xialei Liu
Title Visual recognition in the wild: learning from rankings in small domains and continual learning in new domains Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition application, such as image classification, detection and segmentation. In this thesis we address two limitations of CNNs. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Another limitation is that training CNNs in a continual learning setting is still an open research question. Catastrophic forgetting is very likely when adapting trained models to new environments or new tasks. Therefore, in this thesis, we aim to improve CNNs for applications with limited data and to adapt CNNs continually to new tasks.
Self-supervised learning leverages unlabelled data by introducing an auxiliary task for which data is abundantly available. In the first part of the thesis, we show how rankings can be used as a proxy self-supervised task for regression problems. Then we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning. We then apply our framework on two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both, we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results. We further show that active learning using rankings can reduce labeling effort by up to 50\% for both IQA and crowd counting.
In the second part of the thesis, we propose two approaches to avoiding catastrophic forgetting in sequential task learning scenarios. The first approach is derived from Elastic Weight Consolidation, which uses a diagonal Fisher Information Matrix (FIM) to measure the importance of the parameters of the network. However the diagonal assumption is unrealistic. Therefore, we approximately diagonalize the FIM using a set of factorized rotation parameters. This leads to significantly better performance on continual learning of sequential tasks. For the second approach, we show that forgetting manifests differently at different layers in the network and propose a hybrid approach where distillation is used in the feature extractor and replay in the classifier via feature generation. Our method addresses the limitations of generative image replay and probability distillation (i.e. learning without forgetting) and can naturally aggregate new tasks in a single, well-calibrated classifier. Experiments confirm that our proposed approach outperforms the baselines and some start-of-the-art methods.
Address December 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Andrew Bagdanov
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-4-0 Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ Liu2019 Serial 3396
Permanent link to this record
 

 
Author Javad Zolfaghari Bengar
Title Reducing Label Effort with Deep Active Learning Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition applications, such as image classification, detection and segmentation. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected
informative and/or representative samples. In this thesis we study several aspects of active learning including video object detection for autonomous driving systems, image classification on balanced and imbalanced datasets and the incorporation of self-supervised learning in active learning. We briefly describe our approach in each of these areas to reduce the labeling effort.
In chapter two we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our criterion is based on the estimated number of errors in terms of false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active
learning for video object detection in road scenes. Finally, we show that our
approach outperforms active learning baselines tested on two outdoor datasets.
In the next chapter we address the well-known problem of over confidence in the neural networks. As an alternative to network confidence, we propose a new informativeness-based active learning method that captures the learning dynamics of neural network with a metric called label-dispersion. This metric is low when the network consistently assigns the same label to the sample during the course of training and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
In chapter four, we tackle the problem of sampling bias in active learning methods on imbalanced datasets. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called longtail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we propose a general optimization framework that explicitly takes class-balancing into account. Results on three datasets show that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we show that also on balanced datasets our method generally results in a performance gain.
Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent advancements in self-training have achieved very impressive results rivaling supervised learning on some datasets. In the last chapter we focus on whether active learning and self supervised learning can benefit from each other.
We study object recognition datasets with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high.
Address December 2021
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Bogdan Raducanu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-9-2 Medium
Area Expedition Conference
Notes LAMP; Approved no
Call Number Admin @ si @ Zol2021 Serial 3609
Permanent link to this record
 

 
Author Kai Wang
Title Continual learning for hierarchical classification, few-shot recognition, and multi-modal learning Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Deep learning has drastically changed computer vision in the past decades and achieved great success in many applications, such as image classification, retrieval, detection, and segmentation thanks to the emergence of neural networks. Typically, for most applications, these networks are presented with examples from all tasks they are expected to perform. However, for many applications, this is not a realistic
scenario, and an algorithm is required to learn tasks sequentially. Continual learning proposes theory and methods for this scenario.
The main challenge for continual learning systems is called catastrophic forgetting and refers to a significant drop in performance on previous tasks. To tackle this problem, three main branches of methods have been explored to alleviate the forgetting in continual learning. They are regularization-based methods, rehearsalbased methods, and parameter isolation-based methods. However, most of them are focused on image classification tasks. Continual learning of many computer vision fields has still not been well-explored. Thus, in this thesis, we extend the continual learning knowledge to meta learning, we propose a method for the incremental learning of hierarchical relations for image classification, we explore image recognition in online continual learning, and study continual learning for cross-modal learning.
In this thesis, we explore the usage of image rehearsal when addressing the incremental meta learning problem. Observing that existingmethods fail to improve performance with saved exemplars, we propose to mix exemplars with current task data and episode-level distillation to overcome forgetting in incremental meta learning. Next, we study a more realistic image classification scenario where each class has multiple granularity levels. Only one label is present at any time, which requires the model to infer if the provided label has a hierarchical relation with any already known label. In experiments, we show that the estimated hierarchy information can be beneficial in both the training and inference stage.
For the online continual learning setting, we investigate the usage of intermediate feature replay. In this case, the training samples are only observed by the model only one time. Here we fix thememory buffer for feature replay and compare the effectiveness of saving features from different layers. Finally, we investigate multi-modal continual learning, where an image encoder is cooperating with a semantic branch. We consider the continual learning of both zero-shot learning and cross-modal retrieval problems.
Address July, 2022
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Luis Herranz;Joost Van de Weijer
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-124793-2-4 Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ Wan2022 Serial 3714
Permanent link to this record
 

 
Author Chenshen Wu
Title Going beyond Classification Problems for the Continual Learning of Deep Neural Networks Type Book Whole
Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Deep learning has made tremendous progress in the last decade due to the explosion of training data and computational power. Through end-to-end training on a
large dataset, image representations are more discriminative than the previously
used hand-crafted features. However, for many real-world applications, training
and testing on a single dataset is not realistic, as the test distribution may change over time. Continuous learning takes this situation into account, where the learner must adapt to a sequence of tasks, each with a different distribution. If you would naively continue training the model with a new task, the performance of the model would drop dramatically for the previously learned data. This phenomenon is known as catastrophic forgetting.
Many approaches have been proposed to address this problem, which can be divided into three main categories: regularization-based approaches, rehearsal-based
approaches, and parameter isolation-based approaches. However, most of the existing works focus on image classification tasks and many other computer vision tasks
have not been well-explored in the continual learning setting. Therefore, in this
thesis, we study continual learning for image generation, object re-identification,
and object counting.
For the image generation problem, since the model can generate images from the previously learned task, it is free to apply rehearsal without any limitation. We developed two methods based on generative replay. The first one uses the generated image for joint training together with the new data. The second one is based on
output pixel-wise alignment. We extensively evaluate these methods on several
benchmarks.
Next, we study continual learning for object Re-Identification (ReID). Although
most state-of-the-art methods of ReID and continual ReID use softmax-triplet loss,
we found that it is better to solve the ReID problem from a meta-learning perspective because continual learning of reID can benefit a lot from the generalization of metalearning. We also propose a distillation loss and found that the removal of the positive pairs before the distillation loss is critical.
Finally, we study continual learning for the counting problem. We study the mainstream method based on density maps and propose a new approach for density
map distillation. We found that fixing the counter head is crucial for the continual learning of object counting. To further improve results, we propose an adaptor to adapt the changing feature extractor for the fixed counter head. Extensive evaluation shows that this results in improved continual learning performance.
Address
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Bogdan Raducanu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-126409-0-8 Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ Wu2023 Serial 3960
Permanent link to this record
 

 
Author Diego Alejandro Cheda
Title Monocular Depth Cues in Computer Vision Applications Type Book Whole
Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Depth perception is a key aspect of human vision. It is a routine and essential visual task that the human do effortlessly in many daily activities. This has often been associated with stereo vision, but humans have an amazing ability to perceive depth relations even from a single image by using several monocular cues.

In the computer vision field, if image depth information were available, many tasks could be posed from a different perspective for the sake of higher performance and robustness. Nevertheless, given a single image, this possibility is usually discarded, since obtaining depth information has frequently been performed by three-dimensional reconstruction techniques, requiring two or more images of the same scene taken from different viewpoints. Recently, some proposals have shown the feasibility of computing depth information from single images. In essence, the idea is to take advantage of a priori knowledge of the acquisition conditions and the observed scene to estimate depth from monocular pictorial cues. These approaches try to precisely estimate the scene depth maps by employing computationally demanding techniques. However, to assist many computer vision algorithms, it is not really necessary computing a costly and detailed depth map of the image. Indeed, just a rough depth description can be very valuable in many problems.

In this thesis, we have demonstrated how coarse depth information can be integrated in different tasks following alternative strategies to obtain more precise and robust results. In that sense, we have proposed a simple, but reliable enough technique, whereby image scene regions are categorized into discrete depth ranges to build a coarse depth map. Based on this representation, we have explored the potential usefulness of our method in three application domains from novel viewpoints: camera rotation parameters estimation, background estimation and pedestrian candidate generation. In the first case, we have computed camera rotation mounted in a moving vehicle applying two novels methods based on distant elements in the image, where the translation component of the image flow vectors is negligible. In background estimation, we have proposed a novel method to reconstruct the background by penalizing close regions in a cost function, which integrates color, motion, and depth terms. Finally, we have benefited of geometric and depth information available on single images for pedestrian candidate generation to significantly reduce the number of generated windows to be further processed by a pedestrian classifier. In all cases, results have shown that our approaches contribute to better performances.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Daniel Ponsa;Antonio Lopez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Che2012 Serial 2210
Permanent link to this record
 

 
Author Albert Gordo
Title Document Image Representation, Classification and Retrieval in Large-Scale Domains Type Book Whole
Year 2013 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Despite the “paperless office” ideal that started in the decade of the seventies, businesses still strive against an increasing amount of paper documentation. Companies still receive huge amounts of paper documentation that need to be analyzed and processed, mostly in a manual way. A solution for this task consists in, first, automatically scanning the incoming documents. Then, document images can be analyzed and information can be extracted from the data. Documents can also be automatically dispatched to the appropriate workflows, used to retrieve similar documents in the dataset to transfer information, etc.

Due to the nature of this “digital mailroom”, we need document representation methods to be general, i.e., able to cope with very different types of documents. We need the methods to be sound, i.e., able to cope with unexpected types of documents, noise, etc. And, we need to methods to be scalable, i.e., able to cope with thousands or millions of documents that need to be processed, stored, and consulted. Unfortunately, current techniques of document representation, classification and retrieval are not apt for this digital mailroom framework, since they do not fulfill some or all of these requirements.

Through this thesis we focus on the problem of document representation aimed at classification and retrieval tasks under this digital mailroom framework. We first propose a novel document representation based on runlength histograms, and extend it to cope with more complex documents such as multiple-page documents, or documents that contain more sources of information such as extracted OCR text. Then we focus on the scalability requirements and propose a novel binarization method which we dubbed PCAE, as well as two general asymmetric distances between binary embeddings that can significantly improve the retrieval results at a minimal extra computational cost. Finally, we note the importance of supervised learning when performing large-scale retrieval, and study several approaches that can significantly boost the results at no extra cost at query time.
Address Barcelona
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Ernest Valveny;Florent Perronnin
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ Gor2013 Serial 2277
Permanent link to this record
 

 
Author Gemma Rotger
Title Lifelike Humans: Detailed Reconstruction of Expressive Human Faces Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Developing human-like digital characters is a challenging task since humans are used to recognizing our fellows, and find the computed generated characters inadequately humanized. To fulfill the standards of the videogame and digital film productions it is necessary to model and animate these characters the most closely to human beings. However, it is an arduous and expensive task, since many artists and specialists are required to work on a single character. Therefore, to fulfill these requirements we found an interesting option to study the automatic creation of detailed characters through inexpensive setups. In this work, we develop novel techniques to bring detailed characters by combining different aspects that stand out when developing realistic characters, skin detail, facial hairs, expressions, and microexpressions. We examine each of the mentioned areas with the aim of automatically recover each of the parts without user interaction nor training data. We study the problems for their robustness but also for the simplicity of the setup, preferring single-image with uncontrolled illumination and methods that can be easily computed with the commodity of a standard laptop. A detailed face with wrinkles and skin details is vital to develop a realistic character. In this work, we introduce our method to automatically describe facial wrinkles on the image and transfer to the recovered base face. Then we advance to facial hair recovery by resolving a fitting problem with a novel parametrization model. As of last, we develop a mapping function that allows transfer expressions and microexpressions between different meshes, which provides realistic animations to our detailed mesh. We cover all the mentioned points with the focus on key aspects as (i) how to describe skin wrinkles in a simple and straightforward manner, (ii) how to recover 3D from 2D detections, (iii) how to recover and model facial hair from 2D to 3D, (iv) how to transfer expressions between models holding both skin detail and facial hair, (v) how to perform all the described actions without training data nor user interaction. In this work, we present our proposals to solve these aspects with an efficient and simple setup. We validate our work with several datasets both synthetic and real data, prooving remarkable results even in challenging cases as occlusions as glasses, thick beards, and indeed working with different face topologies like single-eyed cyclops.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Felipe Lumbreras;Antonio Agudo
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-3-0 Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Rot2021 Serial 3513
Permanent link to this record
 

 
Author Javier Marin
Title Pedestrian Detection Based on Local Experts Type Book Whole
Year 2013 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) During the last decade vision-based human detection systems have started to play a key rolein multiple applications linked to driver assistance, surveillance, robot sensing and home automation.
Detecting humans is by far one of the most challenging tasks in Computer Vision.
This is mainly due to the high degree of variability in the human appearanceassociated to
the clothing, pose, shape and size. Besides, other factors such as cluttered scenarios, partial occlusions, or environmental conditions can make the detection task even harder.
Most promising methods of the state-of-the-art rely on discriminative learning paradigms which are fed with positive and negative examples. The training data is one of the most
relevant elements in order to build a robust detector as it has to cope the large variability of the target. In order to create this dataset human supervision is required. The drawback at this point is the arduous effort of annotating as well as looking for such claimed variability.
In this PhD thesis we address two recurrent problems in the literature. In the first stage,we aim to reduce the consuming task of annotating, namely, by using computer graphics.
More concretely, we develop a virtual urban scenario for later generating a pedestrian dataset.
Then, we train a detector using this dataset, and finally we assess if this detector can be successfully applied in a real scenario.
In the second stage, we focus on increasing the robustness of our pedestrian detectors
under partial occlusions. In particular, we present a novel occlusion handling approach to increase the performance of block-based holistic methods under partial occlusions. For this purpose, we make use of local experts via a RandomSubspaceMethod (RSM) to handle these cases. If the method infers a possible partial occlusion, then the RSM, based on performance statistics obtained from partially occluded data, is applied. The last objective of this thesis
is to propose a robust pedestrian detector based on an ensemble of local experts. To achieve this goal, we use the random forest paradigm, where the trees act as ensembles an their nodesare the local experts. In particular, each expert focus on performing a robust classification ofa pedestrian body patch. This approach offers computational efficiency and far less design complexity when compared to other state-of-the-artmethods, while reaching better accuracy
Address Barcelona
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Antonio Lopez;Jaume Amores
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Mar2013 Serial 2280
Permanent link to this record
 

 
Author Susana Alvarez
Title Revisión de la teoría de los Textons Enfoque computacional en color Type Book Whole
Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) El color y la textura son dos estímulos visuales importantes para la interpretación de las imágenes. La definición de descriptores computacionales que combinan estas dos características es aún un problema abierto. La dificultad se deriva esencialmente de la propia naturaleza de ambas, mientras que la textura es una propiedad de una región, el color es una propiedad de un punto.

Hasta ahora se han utilizado tres los tipos de aproximaciones para la combinación, (a) se describe la textura directamente en cada uno de los canales color, (b) se describen textura y color por separado y se combinan al final, y (c) la combinación se realiza con técnicas de aprendizaje automático. Considerando que este problema se resuelve en el sistema visual humano en niveles muy tempranos, en esta tesis se propone estudiar el problema a partir de la implementación directa de una teoría perceptual, la teoría de los textons, y explorar así su extensión a color.

Puesto que la teoría de los textons se basa en la descripción de la textura a partir de las densidades de los atributos locales, esto se adapta perfectamente al marco de trabajo de los descriptores holísticos (bag-of-words). Se han estudiado diversos descriptores basados en diferentes espacios de textons, y diferentes representaciones de las imágenes. Asimismo se ha estudiado la viabilidad de estos descriptores en una representación conceptual de nivel intermedio.

Los descriptores propuestos han demostrado ser muy eficientes en aplicaciones de recuperación y clasificación de imágenes, presentando ventajas en la generación de vocabularios. Los vocabularios se obtienen cuantificando directamente espacios de baja dimensión y la perceptualidad de estos espacios permite asociar semántica de bajo nivel a las palabras visuales. El estudio de los resultados permite concluir que si bien la aproximación holística es muy eficiente, la introducción de co-ocurrencia espacial de las propiedades de forma y color de los blobs de la imagen es un elemento clave para su combinación, hecho que no contradice las evidencias en percepción
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Maria Vanrell;Xavier Otazu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes CIC Approved no
Call Number Alv2012b Serial 2216
Permanent link to this record
 

 
Author Xavier Baro
Title Probabilistic Darwin Machines: A New Approach to Develop Evolutionary Object Detection Type Book Whole
Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Ever since computers were invented, we have wondered whether they might perform some of the human quotidian tasks. One of the most studied and still nowadays less understood problem is the capacity to learn from our experiences and how we generalize the knowledge that we acquire. One of that unaware tasks for the persons and that more interest is awakening in different scientific areas since the beginning, is the one that is known as pattern recognition. The creation of models that represent the world that surrounds us, help us for recognizing objects in our environment, to predict situations, to identify behaviors... All this information allows us to adapt ourselves and to interact with our environment. The capacity of adaptation of individuals to their environment has been related to the amount of patterns that are capable of identifying.

This thesis faces the pattern recognition problem from a Computer Vision point of view, taking one of the most paradigmatic and extended approaches to object detection as starting point. After studying this approach, two weak points are identified: The first makes reference to the description of the objects, and the second is a limitation of the learning algorithm, which hampers the utilization of best descriptors.

In order to address the learning limitations, we introduce evolutionary computation techniques to the classical object detection approach.

After testing the classical evolutionary approaches, such as genetic algorithms, we develop a new learning algorithm based on Probabilistic Darwin Machines, which better adapts to the learning problem. Once the learning limitation is avoided, we introduce a new feature set, which maintains the benefits of the classical feature set, adding the ability to describe non localities. This combination of evolutionary learning algorithm and features is tested on different public data sets, outperforming the results obtained by the classical approach.
Address Barcelona (Spain)
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Vitria
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes OR;HuPBA;MV Approved no
Call Number BCNPCL @ bcnpcl @ Bar2009 Serial 1262
Permanent link to this record
 

 
Author Onur Ferhat
Title Analysis of Head-Pose Invariant, Natural Light Gaze Estimation Methods Type Book Whole
Year 2017 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Eye tracker devices have traditionally been only used inside laboratories, requiring trained professionals and elaborate setup mechanisms. However, in the recent years the scientific work on easier–to–use eye trackers which require no special hardware—other than the omnipresent front facing cameras in computers, tablets, and mobiles—is aiming at making this technology common–place. These types of trackers have several extra challenges that make the problem harder, such as low resolution images provided by a regular webcam, the changing ambient lighting conditions, personal appearance differences, changes in head pose, and so on. Recent research in the field has focused on all these challenges in order to provide better gaze estimation performances in a real world setup.

In this work, we aim at tackling the gaze tracking problem in a single camera setup. We first analyze all the previous work in the field, identifying the strengths and weaknesses of each tried idea. We start our work on the gaze tracker with an appearance–based gaze estimation method, which is the simplest idea that creates a direct mapping between a rectangular image patch extracted around the eye in a camera image, and the gaze point (or gaze direction). Here, we do an extensive analysis of the factors that affect the performance of this tracker in several experimental setups, in order to address these problems in future works. In the second part of our work, we propose a feature–based gaze estimation method, which encodes the eye region image into a compact representation. We argue that this type of representation is better suited to dealing with head pose and lighting condition changes, as it both reduces the dimensionality of the input (i.e. eye image) and breaks the direct connection between image pixel intensities and the gaze estimation. Lastly, we use a face alignment algorithm to have robust face pose estimation, using a 3D model customized to the subject using the tracker. We combine this with a convolutional neural network trained on a large dataset of images to build a face pose invariant gaze tracker.
Address September 2017
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Fernando Vilariño
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-945373-5-6 Medium
Area Expedition Conference
Notes MV Approved no
Call Number Admin @ si @ Fer2017 Serial 3018
Permanent link to this record
 

 
Author Jose Carlos Rubio
Title Many-to-Many High Order Matching. Applications to Tracking and Object Segmentation Type Book Whole
Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Feature matching is a fundamental problem in Computer Vision, having multiple applications such as tracking, image classification and retrieval, shape recognition and stereo fusion. In numerous domains, it is useful to represent the local structure of the matching features to increase the matching accuracy or to make the correspondence invariant to certain transformations (affine, homography, etc. . . ). However, encoding this knowledge requires complicating the model by establishing high-order relationships between the model elements, and therefore increasing the complexity of the optimization problem.

The importance of many-to-many matching is sometimes dismissed in the literature. Most methods are restricted to perform one-to-one matching, and are usually validated on synthetic, or non-realistic datasets. In a real challenging environment, with scale, pose and illumination variations of the object of interest, as well as the presence of occlusions, clutter, and noisy observations, many-to-many matching is necessary to achieve satisfactory results. As a consequence, finding the most likely many-to-many correspondence often involves a challenging combinatorial optimization process.

In this work, we design and demonstrate matching algorithms that compute many-to-many correspondences, applied to several challenging problems. Our goal is to make use of high-order representations to improve the expressive power of the matching, at the same time that we make feasible the process of inference or optimization of such models. We effectively use graphical models as our preferred representation because they provide an elegant probabilistic framework to tackle structured prediction problems.

We introduce a matching-based tracking algorithm which performs matching between frames of a video sequence in order to solve the difficult problem of headlight tracking at night-time. We also generalise this algorithm to solve the problem of data association applied to various tracking scenarios. We demonstrate the effectiveness of such approach in real video sequences and we show that our tracking algorithm can be used to improve the accuracy of a headlight classification system.

In the second part of this work, we move from single (point) matching to dense (region) matching and we introduce a new hierarchical image representation. We make use of such model to develop a high-order many-to-many matching between pairs of images. We show that the use of high-order models in comparison to simpler models improves not only the accuracy of the results, but also the convergence speed of the inference algorithm.

Finally, we keep exploiting the idea of region matching to design a fully unsupervised image co-segmentation algorithm that is able to perform competitively with state-of-the-art supervised methods. Our method also overcomes the typical drawbacks of some of the past works, such as avoiding the necessity of variate appearances on the image backgrounds. The region matching in this case is applied to effectively exploit inter-image information. We also extend this work to perform co-segmentation of videos, being the first time that such problem is addressed, as a way to perform video object segmentation
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joan Serrat
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Rub2012 Serial 2206
Permanent link to this record
 

 
Author Pau Rodriguez
Title Towards Robust Neural Models for Fine-Grained Image Recognition Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract (up) Fine-grained recognition, i.e. identifying similar subcategories of the same superclass, is central to human activity. Recognizing a friend, finding bacteria in microscopic imagery, or discovering a new kind of galaxy, are just but few examples. However, fine-grained image recognition is still a challenging computer vision task since the differences between two images of the same category can overwhelm the differences between two images of different fine-grained categories. In this regime, where the difference between two categories resides on subtle input changes, excessively invariant CNNs discard those details that help to discriminate between categories and focus on more obvious changes, yielding poor classification performance.
On the other hand, CNNs with too much capacity tend to memorize instance-specific details, thus causing overfitting. In this thesis,motivated by the
potential impact of automatic fine-grained image recognition, we tackle the previous challenges and demonstrate that proper alignment of the inputs, multiple levels of attention, regularization, and explicitmodeling of the output space, results inmore accurate fine-grained recognitionmodels, that generalize better, and are more robust to intra-class variation. Concretely, we study the different stages of the neural network pipeline: input pre-processing, attention to regions, feature activations, and the label space. In each stage, we address different issues that hinder the recognition performance on various fine-grained tasks, and devise solutions in each chapter: i)We deal with the sensitivity to input alignment on fine-grained human facial motion such as pain. ii) We introduce an attention mechanism to allow CNNs to choose and process in detail the most discriminate regions of the image. iii)We further extend attention mechanisms to act on the network activations,
thus allowing them to correct their predictions by looking back at certain
regions, at different levels of abstraction. iv) We propose a regularization loss to prevent high-capacity neural networks to memorize instance details by means of almost-identical feature detectors. v)We finally study the advantages of explicitly modeling the output space within the error-correcting framework. As a result, in this thesis we demonstrate that attention and regularization seem promising directions to overcome the problems of fine-grained image recognition, as well as proper treatment of the input and the output space.
Address March 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Jordi Gonzalez;Josep M. Gonfaus;Xavier Roca
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-3-5 Medium
Area Expedition Conference
Notes ISE; 600.119 Approved no
Call Number Admin @ si @ Rod2019 Serial 3258
Permanent link to this record
 

 
Author Carola Figueroa Flores
Title Visual Saliency for Object Recognition, and Object Recognition for Visual Saliency Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords computer vision; visual saliency; fine-grained object recognition; convolutional neural networks; images classification
Abstract (up) For humans, the recognition of objects is an almost instantaneous, precise and
extremely adaptable process. Furthermore, we have the innate capability to learn
new object classes from only few examples. The human brain lowers the complexity
of the incoming data by filtering out part of the information and only processing
those things that capture our attention. This, mixed with our biological predisposition to respond to certain shapes or colors, allows us to recognize in a simple
glance the most important or salient regions from an image. This mechanism can
be observed by analyzing on which parts of images subjects place attention; where
they fix their eyes when an image is shown to them. The most accurate way to
record this behavior is to track eye movements while displaying images.
Computational saliency estimation aims to identify to what extent regions or
objects stand out with respect to their surroundings to human observers. Saliency
maps can be used in a wide range of applications including object detection, image
and video compression, and visual tracking. The majority of research in the field has
focused on automatically estimating saliency maps given an input image. Instead, in
this thesis, we set out to incorporate saliency maps in an object recognition pipeline:
we want to investigate whether saliency maps can improve object recognition
results.
In this thesis, we identify several problems related to visual saliency estimation.
First, to what extent the estimation of saliency can be exploited to improve the
training of an object recognition model when scarce training data is available. To
solve this problem, we design an image classification network that incorporates
saliency information as input. This network processes the saliency map through a
dedicated network branch and uses the resulting characteristics to modulate the
standard bottom-up visual characteristics of the original image input. We will refer to this technique as saliency-modulated image classification (SMIC). In extensive
experiments on standard benchmark datasets for fine-grained object recognition,
we show that our proposed architecture can significantly improve performance,
especially on dataset with scarce training data.
Next, we address the main drawback of the above pipeline: SMIC requires an
explicit saliency algorithm that must be trained on a saliency dataset. To solve this,
we implement a hallucination mechanism that allows us to incorporate the saliency
estimation branch in an end-to-end trained neural network architecture that only
needs the RGB image as an input. A side-effect of this architecture is the estimation
of saliency maps. In experiments, we show that this architecture can obtain similar
results on object recognition as SMIC but without the requirement of ground truth
saliency maps to train the system.
Finally, we evaluated the accuracy of the saliency maps that occur as a sideeffect of object recognition. For this purpose, we use a set of benchmark datasets
for saliency evaluation based on eye-tracking experiments. Surprisingly, the estimated saliency maps are very similar to the maps that are computed from human
eye-tracking experiments. Our results show that these saliency maps can obtain
competitive results on benchmark saliency maps. On one synthetic saliency dataset
this method even obtains the state-of-the-art without the need of ever having seen
an actual saliency image for training.
Address March 2021
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Bogdan Raducanu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-4-7 Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ Fig2021 Serial 3600
Permanent link to this record