|   | 
Details
   web
Records
Author Antonio Esteban Lansaque
Title An Endoscopic Navigation System for Lung Cancer Biopsy Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract Lung cancer is one of the most diagnosed cancers among men and women. Actually,
lung cancer accounts for 13% of the total cases with a 5-year global survival
rate in patients. Although Early detection increases survival rate from 38% to 67%, accurate diagnosis remains a challenge. Pathological confirmation requires extracting a sample of the lesion tissue for its biopsy. The preferred procedure for tissue biopsy is called bronchoscopy. A bronchoscopy is an endoscopic technique for the internal exploration of airways which facilitates the performance of minimal invasive interventions with low risk for the patient. Recent advances in bronchoscopic devices have increased their use for minimal invasive diagnostic and intervention procedures, like lung cancer biopsy sampling. Despite the improvement in bronchoscopic device quality, there is a lack of intelligent computational systems for supporting in-vivo clinical decision during examinations. Existing technologies fail to accurately reach the lesion due to several aspects at intervention off-line planning and poor intra-operative guidance at exploration time. Existing guiding systems radiate patients and clinical staff,might be expensive and achieve a suboptimlal 70% of yield boost. Diagnostic yield could be improved reducing radiation and costs by developing intra-operative support systems able to guide the bronchoscopist to the lesion during the intervention. The goal of this PhD thesis is to develop an image-based navigation systemfor intra-operative guidance of bronchoscopists to a target lesion across a path previously planned on a CT-scan. We propose a 3D navigation system which uses the anatomy of video bronchoscopy frames to locate the bronchoscope within the airways. Once the bronchoscope is located, our navigation system is able to indicate the bifurcation which needs to be followed to reach the lesion. In order to facilitate an off-line validation
as realistic as possible, we also present a method for augmenting simulated virtual bronchoscopies with the appearance of intra-operative videos. Experiments performed on augmented and intra-operative videos, prove that our algorithm can be speeded up for an on-line implementation in the operating room.
Address October 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Debora Gil;Carles Sanchez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-0-2 Medium
Area Expedition Conference
Notes IAM; 600.139; 600.145 Approved no
Call Number Admin @ si @ Est2019 Serial 3392
Permanent link to this record
 

 
Author Lichao Zhang
Title Towards end-to-end Networks for Visual Tracking in RGB and TIR Videos Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract In the current work, we identify several problems of current tracking systems. The lack of large-scale labeled datasets hampers the usage of deep learning, especially end-to-end training, for tracking in TIR images. Therefore, many methods for tracking on TIR data are still based on hand-crafted features. This situation also happens in multi-modal tracking, e.g. RGB-T tracking. Another reason, which hampers the development of RGB-T tracking, is that there exists little research on the fusion mechanisms for combining information from RGB and TIR modalities. One of the crucial components of most trackers is the update module. For the currently existing end-to-end tracking architecture, e.g, Siamese trackers, the online model update is still not taken into consideration at the training stage. They use no-update or a linear update strategy during the inference stage. While such a hand-crafted approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update.

To address the data-scarcity for TIR and RGB-T tracking, we use image-to-image translation to generate a large-scale synthetic TIR dataset. This dataset allows us to perform end-to-end training for TIR tracking. Furthermore, we investigate several fusion mechanisms for RGB-T tracking. The multi-modal trackers are also trained in an end-to-end manner on the synthetic data. To improve the standard online update, we pose the updating step as an optimization problem which can be solved by training a neural network. Our approach thereby reduces the hand-crafted components in the tracking pipeline and sets a further step in the direction of a complete end-to-end trained tracking network which also considers updating during optimization.
Address November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Abel Gonzalez;Fahad Shahbaz Khan
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-1210011-1-9 Medium
Area Expedition Conference
Notes LAMP; 600.141; 600.120 Approved no
Call Number Admin @ si @ Zha2019 Serial 3393
Permanent link to this record
 

 
Author Lu Yu
Title Semantic Representation: From Color to Deep Embeddings Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings.
In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists.
In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
Address November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Yongmei Cheng
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-3-3 Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ Yu2019 Serial 3394
Permanent link to this record
 

 
Author Albert Berenguel
Title Analysis of background textures in banknotes and identity documents for counterfeit detection Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract Counterfeiting and piracy are a form of theft that has been steadily growing in recent years. A counterfeit is an unauthorized reproduction of an authentic/genuine object. Banknotes and identity documents are two common objects of counterfeiting. The former is used by organized criminal groups to finance a variety of illegal activities or even to destabilize entire countries due the inflation effect. Generally, in order to run their illicit businesses, counterfeiters establish companies and bank accounts using fraudulent identity documents. The illegal activities generated by counterfeit banknotes and identity documents has a damaging effect on business, the economy and the general population. To fight against counterfeiters, governments and authorities around the globe cooperate and develop security features to protect their security documents. Many of the security features in identity documents can also be found in banknotes. In this dissertation we focus our efforts in detecting the counterfeit banknotes and identity documents by analyzing the security features at the background printing. Background areas on secure documents contain fine-line patterns and designs that are difficult to reproduce without the manufacturers cutting-edge printing equipment. Our objective is to find the loose of resolution between the genuine security document and the printed counterfeit version with a publicly available commercial printer. We first present the most complete survey to date in identity and banknote security features. The compared algorithms and systems are based on computer vision and machine learning. Then we advance to present the banknote and identity counterfeit dataset we have built and use along all this thesis. Afterwards, we evaluate and adapt algorithms in the literature for the security background texture analysis. We study this problem from the point of view of robustness, computational efficiency and applicability into a real and non-controlled industrial scenario, proposing key insights to use these algorithms. Next, within the industrial environment of this thesis, we build a complete service oriented architecture to detect counterfeit documents. The mobile application and the server framework intends to be used even by non-expert document examiners to spot counterfeits. Later, we re-frame the problem of background texture counterfeit detection as a full-reference game of spotting the differences, by alternating glimpses between a counterfeit and a genuine background using recurrent neural networks. Finally, we deal with the lack of counterfeit samples, studying different approaches based on anomaly detection.
Address November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Oriol Ramos Terrades;Josep Llados
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-2-6 Medium
Area Expedition Conference
Notes DAG; 600.140; 600.121 Approved no
Call Number Admin @ si @ Ber2019 Serial 3395
Permanent link to this record
 

 
Author Xialei Liu
Title Visual recognition in the wild: learning from rankings in small domains and continual learning in new domains Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition application, such as image classification, detection and segmentation. In this thesis we address two limitations of CNNs. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Another limitation is that training CNNs in a continual learning setting is still an open research question. Catastrophic forgetting is very likely when adapting trained models to new environments or new tasks. Therefore, in this thesis, we aim to improve CNNs for applications with limited data and to adapt CNNs continually to new tasks.
Self-supervised learning leverages unlabelled data by introducing an auxiliary task for which data is abundantly available. In the first part of the thesis, we show how rankings can be used as a proxy self-supervised task for regression problems. Then we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning. We then apply our framework on two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both, we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results. We further show that active learning using rankings can reduce labeling effort by up to 50\% for both IQA and crowd counting.
In the second part of the thesis, we propose two approaches to avoiding catastrophic forgetting in sequential task learning scenarios. The first approach is derived from Elastic Weight Consolidation, which uses a diagonal Fisher Information Matrix (FIM) to measure the importance of the parameters of the network. However the diagonal assumption is unrealistic. Therefore, we approximately diagonalize the FIM using a set of factorized rotation parameters. This leads to significantly better performance on continual learning of sequential tasks. For the second approach, we show that forgetting manifests differently at different layers in the network and propose a hybrid approach where distillation is used in the feature extractor and replay in the classifier via feature generation. Our method addresses the limitations of generative image replay and probability distillation (i.e. learning without forgetting) and can naturally aggregate new tasks in a single, well-calibrated classifier. Experiments confirm that our proposed approach outperforms the baselines and some start-of-the-art methods.
Address December 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Andrew Bagdanov
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-4-0 Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ Liu2019 Serial 3396
Permanent link to this record
 

 
Author Yaxing Wang
Title Transferring and Learning Representations for Image Generation and Translation Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract Image generation is arguably one of the most attractive, compelling, and challenging tasks in computer vision. Among the methods which perform image generation, generative adversarial networks (GANs) play a key role. The most common image generation models based on GANs can be divided into two main approaches. The first one, called simply image generation takes random noise as an input and synthesizes an image which follows the same distribution as the images in the training set. The second class, which is called image-to-image translation, aims to map an image from a source domain to one that is indistinguishable from those in the target domain. Image-to-image translation methods can further be divided into paired and unpaired image-to-image translation based on whether they require paired data or not. In this thesis, we aim to address some challenges of both image generation and image-to-image generation.GANs highly rely upon having access to vast quantities of data, and fail to generate realistic images from random noise when applied to domains with few images. To address this problem, we aim to transfer knowledge from a model trained on a large dataset (source domain) to the one learned on limited data (target domain). We find that both GANs andconditional GANs can benefit from models trained on large datasets. Our experiments show that transferring the discriminator is more important than the generator. Using both the generator and discriminator results in the best performance. We found, however, that this method suffers from overfitting, since we update all parameters to adapt to the target data. We propose a novel architecture, which is tailored to address knowledge transfer to very small target domains. Our approach effectively exploreswhich part of the latent space is more related to the target domain. Additionally, the proposed method is able to transfer knowledge from multiple pretrained GANs. Although image-to-image translation has achieved outstanding performance, it still facesseveral problems. First, for translation between complex domains (such as translations between different modalities) image-to-image translation methods require paired data. We show that when only some of the pairwise translations have been seen (i.e. during training), we can infer the remaining unseen translations (where training pairs are not available). We propose a new approach where we align multiple encoders and decoders in such a way that the desired translation can be obtained by simply cascadingthe source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). Second, we address the issue of bias in image-to-image translation. Biased datasets unavoidably contain undesired changes, which are dueto the fact that the target dataset has a particular underlying visual distribution. We use carefully designed semantic constraints to reduce the effects of the bias. The semantic constraint aims to enforce the preservation of desired image properties. Finally, current approaches fail to generate diverse outputs or perform scalable image transfer in a single model. To alleviate this problem, we propose a scalable and diverse image-to-image translation. We employ random noise to control the diversity. The scalabitlity is determined by conditioning the domain label.computer vision, deep learning, imitation learning, adversarial generative networks, image generation, image-to-image translation.
Address January 2020
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Abel Gonzalez;Luis Herranz
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-5-7 Medium
Area Expedition Conference
Notes LAMP; 600.141; 600.120 Approved no
Call Number Admin @ si @ Wan2020 Serial 3397
Permanent link to this record
 

 
Author Sergio Escalera; Stephane Ayache; Jun Wan; Meysam Madadi; Umut Guçlu; Xavier Baro
Title Inpainting and Denoising Challenges Type Book Whole
Year 2019 Publication The Springer Series on Challenges in Machine Learning Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract The problem of dealing with missing or incomplete data in machine learning and computer vision arises in many applications. Recent strategies make use of generative models to impute missing or corrupted data. Advances in computer vision using deep generative models have found applications in image/video processing, such as denoising, restoration, super-resolution, or inpainting.
Inpainting and Denoising Challenges comprises recent efforts dealing with image and video inpainting tasks. This includes winning solutions to the ChaLearn Looking at People inpainting and denoising challenges: human pose recovery, video de-captioning and fingerprint restoration.
This volume starts with a wide review on image denoising, retracing and comparing various methods from the pioneer signal processing methods, to machine learning approaches with sparse and low-rank models, and recent deep learning architectures with autoencoders and variants. The following chapters present results from the Challenge, including three competition tasks at WCCI and ECML 2018. The top best approaches submitted by participants are described, showing interesting contributions and innovating methods. The last two chapters propose novel contributions and highlight new applications that benefit from image/video inpainting.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HUPBA; no menciona Approved no
Call Number Admin @ si @ EAW2019 Serial 3398
Permanent link to this record
 

 
Author Hugo Jair Escalante; Sergio Escalera; Isabelle Guyon; Xavier Baro; Yagmur Gucluturk; Umut Guçlu; Marcel van Gerven
Title Explainable and Interpretable Models in Computer Vision and Machine Learning Type Book Whole
Year 2018 Publication The Springer Series on Challenges in Machine Learning Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract This book compiles leading research on the development of explainable and interpretable machine learning methods in the context of computer vision and machine learning.
Research progress in computer vision and pattern recognition has led to a variety of modeling techniques with almost human-like performance. Although these models have obtained astounding results, they are limited in their explainability and interpretability: what is the rationale behind the decision made? what in the model structure explains its functioning? Hence, while good performance is a critical required characteristic for learning machines, explainability and interpretability capabilities are needed to take learning machines to the next step to include them in decision support systems involving human supervision.
This book, written by leading international researchers, addresses key topics of explainability and interpretability, including the following:

·Evaluation and Generalization in Interpretable Machine Learning
·Explanation Methods in Deep Learning
·Learning Functional Causal Models with Generative Neural Networks
·Learning Interpreatable Rules for Multi-Label Classification
·Structuring Neural Networks for More Explainable Predictions
·Generating Post Hoc Rationales of Deep Visual Classification Decisions
·Ensembling Visual Explanations
·Explainable Deep Driving by Visualizing Causal Attention
·Interdisciplinary Perspective on Algorithmic Job Candidate Search
·Multimodal Personality Trait Analysis for Explainable Modeling of Job Interview Decisions
·Inherent Explainability Pattern Theory-based Video Event Interpretations
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA; no menciona Approved no
Call Number Admin @ si @ EEG2018 Serial 3399
Permanent link to this record
 

 
Author Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z. Li
Title Multi-modal Face Presentation Attach Detection Type Book Whole
Year 2020 Publication Synthesis Lectures on Computer Vision Abbreviated Journal
Volume 13 Issue Pages
Keywords (up)
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes HuPBA Approved no
Call Number Admin @ si @ WGE2020 Serial 3440
Permanent link to this record
 

 
Author Josep Llados; Daniel Lopresti; Seiichi Uchida (eds)
Title 16th International Conference, 2021, Proceedings, Part III Type Book Whole
Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal
Volume 12823 Issue Pages
Keywords (up)
Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
Address Lausanne, Switzerland, September 5-10, 2021
Corporate Author Thesis
Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-030-86333-3 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ Serial 3727
Permanent link to this record
 

 
Author Josep Llados; Daniel Lopresti; Seiichi Uchida (eds)
Title 16th International Conference, 2021, Proceedings, Part IV Type Book Whole
Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal
Volume 12824 Issue Pages
Keywords (up)
Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
Address Lausanne, Switzerland, September 5-10, 2021
Corporate Author Thesis
Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-030-86336-4 Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number Admin @ si @ Serial 3728
Permanent link to this record
 

 
Author Fernando Vilariño
Title 3D Scanning of Capitals at Library Living Lab Type Book Whole
Year 2019 Publication “Living Lab Projects 2019”. ENoLL. Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MV; DAG; 600.140; 600.121;SIAI Approved no
Call Number Admin @ si @ Vil2019c Serial 3463
Permanent link to this record
 

 
Author Gabriel Villalonga
Title Leveraging Synthetic Data to Create Autonomous Driving Perception Systems Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract Manually annotating images to develop vision models has been a major bottleneck
since computer vision and machine learning started to walk together. This has
been more evident since computer vision falls on the shoulders of data-hungry
deep learning techniques. When addressing on-board perception for autonomous
driving, the curse of data annotation is exacerbated due to the use of additional
sensors such as LiDAR. Therefore, any approach aiming at reducing such a timeconsuming and costly work is of high interest for addressing autonomous driving
and, in fact, for any application requiring some sort of artificial perception. In the
last decade, it has been shown that leveraging from synthetic data is a paradigm
worth to pursue in order to minimizing manual data annotation. The reason is
that the automatic process of generating synthetic data can also produce different
types of associated annotations (e.g. object bounding boxes for synthetic images
and LiDAR pointclouds, pixel/point-wise semantic information, etc.). Directly
using synthetic data for training deep perception models may not be the definitive
solution in all circumstances since it can appear a synth-to-real domain shift. In
this context, this work focuses on leveraging synthetic data to alleviate manual
annotation for three perception tasks related to driving assistance and autonomous
driving. In all cases, we assume the use of deep convolutional neural networks
(CNNs) to develop our perception models.
The first task addresses traffic sign recognition (TSR), a kind of multi-class
classification problem. We assume that the number of sign classes to be recognized
must be suddenly increased without having annotated samples to perform the
corresponding TSR CNN re-training. We show that leveraging synthetic samples of
such new classes and transforming them by a generative adversarial network (GAN)
trained on the known classes (i.e. without using samples from the new classes), it is
possible to re-train the TSR CNN to properly classify all the signs for a ∼ 1/4 ratio of
new/known sign classes. The second task addresses on-board 2D object detection,
focusing on vehicles and pedestrians. In this case, we assume that we receive a set
of images without the annotations required to train an object detector, i.e. without
object bounding boxes. Therefore, our goal is to self-annotate these images so
that they can later be used to train the desired object detector. In order to reach
this goal, we leverage from synthetic data and propose a semi-supervised learning
approach based on the co-training idea. In fact, we use a GAN to reduce the synthto-real domain shift before applying co-training. Our quantitative results show
that co-training and GAN-based image-to-image translation complement each
other up to allow the training of object detectors without manual annotation, and still almost reaching the upper-bound performances of the detectors trained from
human annotations. While in previous tasks we focus on vision-based perception,
the third task we address focuses on LiDAR pointclouds. Our initial goal was to
develop a 3D object detector trained on synthetic LiDAR-style pointclouds. While
for images we may expect synth/real-to-real domain shift due to differences in
their appearance (e.g. when source and target images come from different camera
sensors), we did not expect so for LiDAR pointclouds since these active sensors
factor out appearance and provide sampled shapes. However, in practice, we have
seen that it can be domain shift even among real-world LiDAR pointclouds. Factors
such as the sampling parameters of the LiDARs, the sensor suite configuration onboard the ego-vehicle, and the human annotation of 3D bounding boxes, do induce
a domain shift. We show it through comprehensive experiments with different
publicly available datasets and 3D detectors. This redirected our goal towards the
design of a GAN for pointcloud-to-pointcloud translation, a relatively unexplored
topic.
Finally, it is worth to mention that all the synthetic datasets used for these three
tasks, have been designed and generated in the context of this PhD work and will
be publicly released. Overall, we think this PhD presents several steps forward to
encourage leveraging synthetic data for developing deep perception models in the
field of driving assistance and autonomous driving.
Address February 2021
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Antonio Lopez;German Ros
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-2-3 Medium
Area Expedition Conference
Notes ADAS; 600.118 Approved no
Call Number Admin @ si @ Vil2021 Serial 3599
Permanent link to this record
 

 
Author Pau Riba
Title Distilling Structure from Imagery: Graph-based Models for the Interpretation of Document Images Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract From its early stages, the community of Pattern Recognition and Computer Vision has considered the importance of leveraging the structural information when understanding images. Usually, graphs have been proposed as a suitable model to represent this kind of information due to their flexibility and representational power able to codify both, the components, objects, or entities and their pairwise relationship. Even though graphs have been successfully applied to a huge variety of tasks, as a result of their symbolic and relational nature, graphs have always suffered from some limitations compared to statistical approaches. Indeed, some trivial mathematical operations do not have an equivalence in the graph domain. For instance, in the core of many pattern recognition applications, there is a need to compare two objects. This operation, which is trivial when considering feature vectors defined in \(\mathbb{R}^n\), is not properly defined for graphs.


In this thesis, we have investigated the importance of the structural information from two perspectives, the traditional graph-based methods and the new advances on Geometric Deep Learning. On the one hand, we explore the problem of defining a graph representation and how to deal with it on a large scale and noisy scenario. On the other hand, Graph Neural Networks are proposed to first redefine a Graph Edit Distance methodologies as a metric learning problem, and second, to apply them in a real use case scenario for the detection of repetitive patterns which define tables in invoice documents. As experimental framework, we have validated the different methodological contributions in the domain of Document Image Analysis and Recognition.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Josep Llados;Alicia Fornes
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-6-4 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Rib20 Serial 3478
Permanent link to this record
 

 
Author Raul Gomez
Title Exploiting the Interplay between Visual and Textual Data for Scene Interpretation Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords (up)
Abstract Machine learning experimentation under controlled scenarios and standard datasets is necessary to compare algorithms performance by evaluating all of them in the same setup. However, experimentation on how those algorithms perform on unconstrained data and applied tasks to solve real world problems is also a must to ascertain how that research can contribute to our society.
In this dissertation we experiment with the latest computer vision and natural language processing algorithms applying them to multimodal scene interpretation. Particularly, we research on how image and text understanding can be jointly exploited to address real world problems, focusing on learning from Social Media data.
We address several tasks that involve image and textual information, discuss their characteristics and offer our experimentation conclusions. First, we work on detection of scene text in images. Then, we work with Social Media posts, exploiting the captions associated to images as supervision to learn visual features, which we apply to multimodal semantic image retrieval. Subsequently, we work with geolocated Social Media images with associated tags, experimenting on how to use the tags as supervision, on location sensitive image retrieval and on exploiting location information for image tagging. Finally, we work on a specific classification problem of Social Media publications consisting on an image and a text: Multimodal hate speech classification.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Dimosthenis Karatzas;Lluis Gomez;Jaume Gibert
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-7-1 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Gom20 Serial 3479
Permanent link to this record