toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Adrien Pavao; Isabelle Guyon; Anne-Catherine Letournel; Dinh-Tuan Tran; Xavier Baro; Hugo Jair Escalante; Sergio Escalera; Tyler Thomas; Zhen Xu edit  url
openurl 
  Title CodaLab Competitions: An Open Source Platform to Organize Scientific Challenges Type Journal Article
  Year 2023 Publication Journal of Machine Learning Research Abbreviated Journal JMLR  
  Volume Issue Pages  
  Keywords  
  Abstract CodaLab Competitions is an open source web platform designed to help data scientists and research teams to crowd-source the resolution of machine learning problems through the organization of competitions, also called challenges or contests. CodaLab Competitions provides useful features such as multiple phases, results and code submissions, multi-score leaderboards, and jobs running
inside Docker containers. The platform is very flexible and can handle large scale experiments, by allowing organizers to upload large datasets and provide their own CPU or GPU compute workers.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ PGL2023 Serial (down) 3973  
Permanent link to this record
 

 
Author Mateusz Pyla; Kamil Deja; Bartłomiej Twardowski; Tomasz Trzcinski edit   pdf
url  openurl
  Title Bayesian Flow Networks in Continual Learning Type Miscellaneous
  Year 2023 Publication arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Bayesian Flow Networks (BFNs) has been recently proposed as one of the most promising direction to universal generative modelling, having ability to learn any of the data type. Their power comes from the expressiveness of neural networks and Bayesian inference which make them suitable in the context of continual learning. We delve into the mechanics behind BFNs and conduct the experiments to empirically verify the generative capabilities on non-stationary data.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ PDT2023 Serial (down) 3972  
Permanent link to this record
 

 
Author Marcin Przewiezlikowski; Mateusz Pyla; Bartosz Zielinski; Bartłomiej Twardowski; Jacek Tabor; Marek Smieja edit   pdf
url  openurl
  Title Augmentation-aware Self-supervised Learning with Guided Projector Type Miscellaneous
  Year 2023 Publication arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Self-supervised learning (SSL) is a powerful technique for learning robust representations from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo are able to reach quality on par with supervised approaches. However, this invariance may be harmful to solving some downstream tasks which depend on traits affected by augmentations used during pretraining, such as color. In this paper, we propose to foster sensitivity to such characteristics in the representation space by modifying the projector network, a common component of self-supervised architectures. Specifically, we supplement the projector with information about augmentations applied to images. In order for the projector to take advantage of this auxiliary conditioning when solving the SSL task, the feature extractor learns to preserve the augmentation information in its representations. Our approach, coined Conditional Augmentation-aware Self-supervised Learning (CASSLE), is directly applicable to typical joint-embedding SSL methods regardless of their objective functions. Moreover, it does not require major changes in the network architecture or prior knowledge of downstream tasks. In addition to an analysis of sensitivity towards different data augmentations, we conduct a series of experiments, which show that CASSLE improves over various SSL methods, reaching state-of-the-art performance in multiple downstream tasks.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ PPZ2023 Serial (down) 3971  
Permanent link to this record
 

 
Author Artur Xarles; Sergio Escalera; Thomas B. Moeslund; Albert Clapes edit  url
openurl 
  Title ASTRA: An Action Spotting TRAnsformer for Soccer Videos Type Conference Article
  Year 2023 Publication Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports Abbreviated Journal  
  Volume Issue Pages 93–102  
  Keywords  
  Abstract In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set.  
  Address Otawa; Canada; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference MMSports  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ XEM2023 Serial (down) 3970  
Permanent link to this record
 

 
Author Cristhian A. Aguilera-Carrasco; Luis Felipe Gonzalez-Böhme; Francisco Valdes; Francisco Javier Quitral Zapata; Bogdan Raducanu edit  doi
openurl 
  Title A Hand-Drawn Language for Human–Robot Collaboration in Wood Stereotomy Type Journal Article
  Year 2023 Publication IEEE Access Abbreviated Journal ACCESS  
  Volume 11 Issue Pages 100975 - 100985  
  Keywords  
  Abstract This study introduces a novel, hand-drawn language designed to foster human-robot collaboration in wood stereotomy, central to carpentry and joinery professions. Based on skilled carpenters’ line and symbol etchings on timber, this language signifies the location, geometry of woodworking joints, and timber placement within a framework. A proof-of-concept prototype has been developed, integrating object detectors, keypoint regression, and traditional computer vision techniques to interpret this language and enable an extensive repertoire of actions. Empirical data attests to the language’s efficacy, with the successful identification of a specific set of symbols on various wood species’ sawn surfaces, achieving a mean average precision (mAP) exceeding 90%. Concurrently, the system can accurately pinpoint critical positions that facilitate robotic comprehension of carpenter-indicated woodworking joint geometry. The positioning error, approximately 3 pixels, meets industry standards.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ AGV2023 Serial (down) 3969  
Permanent link to this record
 

 
Author Patricia Suarez; Dario Carpio; Angel Sappa edit  url
openurl 
  Title A Deep Learning Based Approach for Synthesizing Realistic Depth Maps Type Conference Article
  Year 2023 Publication 22nd International Conference on Image Analysis and Processing Abbreviated Journal  
  Volume 14234 Issue Pages 369–380  
  Keywords  
  Abstract This paper presents a novel cycle generative adversarial network (CycleGAN) architecture for synthesizing high-quality depth maps from a given monocular image. The proposed architecture uses multiple loss functions, including cycle consistency, contrastive, identity, and least square losses, to enable the generation of realistic and high-fidelity depth maps. The proposed approach addresses this challenge by synthesizing depth maps from RGB images without requiring paired training data. Comparisons with several state-of-the-art approaches are provided showing the proposed approach overcome other approaches both in terms of quantitative metrics and visual quality.  
  Address Udine; Italia; Setember 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICIAP  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SCS2023 Serial (down) 3968  
Permanent link to this record
 

 
Author Ruben Perez Tito edit  isbn
openurl 
  Title Exploring the role of Text in Visual Question Answering on Natural Scenes and Documents Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Visual Question Answering (VQA) is the task where given an image and a natural language question, the objective is to generate a natural language answer. At the intersection between computer vision and natural language processing, this task can be seen as a measure of image understanding capabilities, as it requires to reason about objects, actions, colors, positions, the relations between the different elements as well as commonsense reasoning, world knowledge, arithmetic skills and natural language understanding. However, even though the text present in the images conveys important semantically rich information that is explicit and not available in any other form, most VQA methods remained illiterate, largely
ignoring the text despite its potential significance. In this thesis, we set out on a journey to bring reading capabilities to computer vision models applied to the VQA task, creating new datasets and methods that can read, reason and integrate the text with other visual cues in natural scene images and documents.
In Chapter 3, we address the combination of scene text with visual information to fully understand all the nuances of natural scene images. To achieve this objective, we define a new sub-task of VQA that requires reading the text in the image, and highlight the limitations of the current methods. In addition, we propose a new architecture that integrates both modalities and jointly reasons about textual and visual features. In Chapter 5, we shift the domain of VQA with reading capabilities and apply it on scanned industry document images, providing a high-level end-purpose perspective to Document Understanding, which has been
primarily focused on digitizing the document’s contents and extracting key values without considering the ultimate purpose of the extracted information. For this, we create a dataset which requires methods to reason about the unique and challenging elements of documents, such as text, images, tables, graphs and complex layouts, to provide accurate answers in natural language. However, we observed that explicit visual features provide a slight contribution in the overall performance, since the main information is usually conveyed within the text and its position. In consequence, in Chapter 6, we propose VQA on infographic images, seeking for document images with more visually rich elements that require to fully exploit visual information in order to answer the questions. We show the performance gap of
different methods when used over industry scanned and infographic images, and propose a new method that integrates the visual features in early stages, which allows the transformer architecture to exploit the visual features during the self-attention operation. Instead, in Chapter 7, we apply VQA on a big collection of single-page documents, where the methods must find which documents are relevant to answer the question, and provide the answer itself. Finally, in Chapter 8, mimicking real-world application problems where systems must process documents with multiple pages, we address the multipage document visual question answering task. We demonstrate the limitations of existing methods, including models specifically designed to process long sequences. To overcome these limitations, we propose
a hierarchical architecture that can process long documents, answer questions, and provide the index of the page where the information to answer the question is located as an explainability measure.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Ernest Valveny  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-124793-5-5 Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ Per2023 Serial (down) 3967  
Permanent link to this record
 

 
Author Bonifaz Stuhr edit  isbn
openurl 
  Title Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Unsupervised representation learning aims at finding methods that learn representations from data without annotation-based signals. Abstaining from annotations not only leads to economic benefits but may – and to some extent already does – result in advantages regarding the representation’s structure, robustness, and generalizability to different tasks. In the long run, unsupervised methods are expected to surpass their supervised counterparts due to the reduction of human intervention and the inherently more general setup that does not bias the optimization towards an objective originating from specific annotation-based signals. While major advantages of unsupervised representation learning have been recently observed in natural language processing, supervised methods still dominate in vision domains for most tasks. In this dissertation, we contribute to the field of unsupervised (visual) representation learning from three perspectives: (i) Learning representations: We design unsupervised, backpropagation-free Convolutional Self-Organizing Neural Networks (CSNNs) that utilize self-organization- and Hebbian-based learning rules to learn convolutional kernels and masks to achieve deeper backpropagation-free models. Thereby, we observe that backpropagation-based and -free methods can suffer from an objective function mismatch between the unsupervised pretext task and the target task. This mismatch can lead to performance decreases for the target task. (ii) Evaluating representations: We build upon the widely used (non-)linear evaluation protocol to define pretext- and target-objective-independent metrics for measuring the objective function mismatch. With these metrics, we evaluate various pretext and target tasks and disclose dependencies of the objective function mismatch concerning different parts of the training and model setup. (iii) Transferring representations: We contribute CARLANE, the first 3-way sim-to-real domain adaptation benchmark for 2D lane detection. We adopt several well-known unsupervised domain adaptation methods as baselines and propose a method based on prototypical cross-domain self-supervised learning. Finally, we focus on pixel-based unsupervised domain adaptation and contribute a content-consistent unpaired image-to-image translation method that utilizes masks, global and local discriminators, and similarity sampling to mitigate content inconsistencies, as well as feature-attentive denormalization to fuse content-based statistics into the generator stream. In addition, we propose the cKVD metric to incorporate class-specific content inconsistencies into perceptual metrics for measuring translation quality.  
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIA Place of Publication Editor Jordi Gonzalez;Jurgen Brauer  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-126409-6-0 Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Stu2023 Serial (down) 3966  
Permanent link to this record
 

 
Author Diego Velazquez edit  isbn
openurl 
  Title Towards Robustness in Computer-based Image Understanding Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This thesis embarks on an exploratory journey into robustness in deep learning,
with a keen focus on the intertwining facets of generalization, explainability, and
edge cases within the realm of computer vision. In deep learning, robustness
epitomizes a model’s resilience and flexibility, grounded on its capacity to generalize across diverse data distributions, explain its predictions transparently, and navigate the intricacies of edge cases effectively. The challenges associated with robust generalization are multifaceted, encompassing the model’s performance on unseen data and its defense against out-of-distribution data and adversarial attacks. Bridging this gap, the potential of Embedding Propagation (EP) for improving out-of-distribution generalization is explored. EP is depicted as a powerful tool facilitating manifold smoothing, which in turn fortifies the model’s robustness against adversarial onslaughts and bolsters performance in few-shot and self-/semi-supervised learning scenarios. In the labyrinth of deep learning models, the path to robustness often intersects with explainability. As model complexity increases, so does the urgency to decipher their decision-making
processes. Acknowledging this, the thesis introduces a robust framework for
evaluating and comparing various counterfactual explanation methods, echoing
the imperative of explanation quality over quantity and spotlighting the intricacies of diversifying explanations. Simultaneously, the deep learning landscape is fraught with edge cases – anomalies in the form of small objects or rare instances in object detection tasks that defy the norm. Confronting this, the
thesis presents an extension of the DETR (DEtection TRansformer) model to enhance small object detection. The devised DETR-FP, embedding the Feature Pyramid technique, demonstrating improvement in small objects detection accuracy, albeit facing challenges like high computational costs. With emergence of foundation models in mind, the thesis unveils EarthView, the largest scale remote sensing dataset to date, built for the self-supervised learning of a robust foundational model for remote sensing. Collectively, these studies contribute to the grand narrative of robustness in deep learning, weaving together the strands of generalization, explainability, and edge case performance. Through these methodological advancements and novel datasets, the thesis calls for continued exploration, innovation, and refinement to fortify the bastion of robust computer vision.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Jordi Gonzalez;Josep M. Gonfaus;Pau Rodriguez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-81-126409-5-3 Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ Vel2023 Serial (down) 3965  
Permanent link to this record
 

 
Author Yi Xiao edit  isbn
openurl 
  Title Advancing Vision-based End-to-End Autonomous Driving Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In autonomous driving, artificial intelligence (AI) processes the traffic environment to drive the vehicle to a desired destination. Currently, there are different paradigms that address the development of AI-enabled drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception, maneuver planning, and control. On the other hand, we find end-to-end driving approaches that attempt to learn the direct mapping of raw data from input sensors to vehicle control signals. The latter are relatively less studied but are gaining popularity as they are less demanding in terms of data labeling. Therefore, in this thesis, our goal is to investigate end-to-end autonomous driving.
We propose to evaluate three approaches to tackle the challenge of end-to-end
autonomous driving. First, we focus on the input, considering adding depth information as complementary to RGB data, in order to mimic the human being’s
ability to estimate the distance to obstacles. Notice that, in the real world, these depth maps can be obtained either from a LiDAR sensor, or a trained monocular
depth estimation module, where human labeling is not needed. Then, based on
the intuition that the latent space of end-to-end driving models encodes relevant
information for driving, we use it as prior knowledge for training an affordancebased driving model. In this case, the trained affordance-based model can achieve good performance while requiring less human-labeled data, and it can provide interpretability regarding driving actions. Finally, we present a new pure vision-based end-to-end driving model termed CIL++, which is trained by imitation learning.
CIL++ leverages modern best practices, such as a large horizontal field of view and
a self-attention mechanism, which are contributing to the agent’s understanding of
the driving scene and bringing a better imitation of human drivers. Using training
data without any human labeling, our model yields almost expert performance in
the CARLA NoCrash benchmark and could rival SOTA models that require large amounts of human-labeled data.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Antonio Lopez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-126409-4-6 Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Xia2023 Serial (down) 3964  
Permanent link to this record
 

 
Author Shiqi Yang edit  isbn
openurl 
  Title Towards Source-Free Domain Adaption of Neural Networks in an Open World Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Though they achieve great success, deep neural networks typically require a huge
amount of labeled data for training. However, collecting labeled data is often laborious and expensive. It would, therefore, be ideal if the knowledge obtained from label-rich datasets could be transferred to unlabeled data. However, deep networks are weak at generalizing to unseen domains, even when the differences are only subtle between the datasets. In real-world situations, a typical factor impairing the model generalization ability is the distribution shift between data from different domains, which is a long-standing problem usually termed as (unsupervised) domain adaptation.
A crucial requirement in the methodology of these domain adaptation methods is that they require access to source domain data during the adaptation process to the target domain. Accessibility to the source data of a trained source model is often impossible in real-world applications, for example, when deploying domain adaptation algorithms on mobile devices where the computational capacity is limited or in situations where data privacy rules limit access to the source domain data. Without access to the source domain data, existing methods suffer from inferior performance. Thus, in this thesis, we investigate domain adaptation without source data (termed as source-free domain adaptation) in multiple different scenarios that focus on image classification tasks.
We first study the source-free domain adaptation problem in a closed-set setting,
where the label space of different domains is identical. Only accessing the pretrained source model, we propose to address source-free domain adaptation from the perspective of unsupervised clustering. We achieve this based on nearest neighborhood clustering. In this way, we can transfer the challenging source-free domain adaptation task to a type of clustering problem. The final optimization objective is an upper bound containing only two simple terms, which can be explained as discriminability and diversity. We show that this allows us to relate several other methods in domain adaptation, unsupervised clustering and contrastive learning via the perspective of discriminability and diversity.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Joost  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-126409-3-9 Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ Yan2023 Serial (down) 3963  
Permanent link to this record
 

 
Author Jose Elias Yauri edit  openurl
  Title Deep Learning Based Data Fusion Approaches for the Assessment of Cognitive States on EEG Signals Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract For millennia, the study of the couple brain-mind has fascinated the humanity in order to understand the complex nature of cognitive states. A cognitive state is the state of the mind at a specific time and involves cognition activities to acquire and process information for making a decision, solving a problem, or achieving a goal.
While normal cognitive states assist in the successful accomplishment of tasks; on the contrary, abnormal states of the mind can lead to task failures due to a reduced cognition capability. In this thesis, we focus on the assessment of cognitive states by means of the analysis of ElectroEncephaloGrams (EEG) signals using deep learning methods. EEG records the electrical activity of the brain using a set of electrodes placed on the scalp that output a set of spatiotemporal signals that are expected to be correlated to a specific mental process.
From the point of view of artificial intelligence, any method for the assessment of cognitive states using EEG signals as input should face several challenges. On the one hand, one should determine which is the most suitable approach for the optimal combination of the multiple signals recorded by EEG electrodes. On the other hand, one should have a protocol for the collection of good quality unambiguous annotated data, and an experimental design for the assessment of the generalization and transfer of models. In order to tackle them, first, we propose several convolutional neural architectures to perform data fusion of the signals recorded by EEG electrodes, at raw signal and feature levels. Four channel fusion methods, easy to incorporate into any neural network architecture, are proposed and assessed. Second, we present a method to create an unambiguous dataset for the prediction of cognitive mental workload using serious games and an Airbus-320 flight simulator. Third, we present a validation protocol that takes into account the levels of generalization of models based on the source and amount of test data.
Finally, the approaches for the assessment of cognitive states are applied to two use cases of high social impact: the assessment of mental workload for personalized support systems in the cockpit and the detection of epileptic seizures. The results obtained from the first use case show the feasibility of task transfer of models trained to detect workload in serious games to real flight scenarios. The results from the second use case show the generalization capability of our EEG channel fusion methods at k-fold cross-validation, patient-specific, and population levels.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Aura Hernandez;Debora Gil  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM Approved no  
  Call Number Admin @ si @ Yau2023 Serial (down) 3962  
Permanent link to this record
 

 
Author Jose Luis Gomez Zurita edit  openurl
  Title Synth-to-real semi-supervised learning for visual tasks Type Book Whole
  Year 2023 Publication Going beyond Classification Problems for the Continual Learning of Deep Neural Networks Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The curse of data labeling is a costly bottleneck in supervised deep learning, where large amounts of labeled data are needed to train intelligent systems. In onboard perception for autonomous driving, this cost corresponds to the labeling of raw data from sensors such as cameras, LiDARs, RADARs, etc. Therefore, synthetic data with automatically generated ground truth (labels) has aroused as a reliable alternative for training onboard perception models.
However, synthetic data commonly suffers from synth-to-real domain shift, i.e., models trained on the synthetic domain do not show their achievable accuracy when performing in the real world. This shift needs to be addressed by techniques falling in the realm of domain adaptation (DA).
The semi-supervised learning (SSL) paradigm can be followed to address DA. In this case, a model is trained using source data with labels (here synthetic) and leverages minimal knowledge from target data (here the real world) to generate pseudo-labels. These pseudo-labels help the training process to reduce the gap between the source and the target domains. In general, we can assume accessing both, pseudo-labels and a few amounts of human-provided labels for the target-domain data. However, the most interesting and challenging setting consists in assuming that we do not have human-provided labels at all. This setting is known as unsupervised domain adaptation (UDA). This PhD focuses on applying SSL to the UDA setting, for onboard visual tasks related to autonomous driving. We start by addressing the synth-to-real UDA problem on onboard vision-based object detection (pedestrians and cars), a critical task for autonomous driving and driving assistance. In particular, we propose to apply an SSL technique known as co-training, which we adapt to work with deep models that process a multi-modal input. The multi-modality consists of the visual appearance of the images (RGB) and their monocular depth estimation. The synthetic data we use as the source domain contains both, object bounding boxes and depth information. This prior knowledge is the
starting point for the co-training technique, which iteratively labels unlabeled real-world data and uses such pseudolabels (here bounding boxes with an assigned object class) to progressively improve the labeling results. Along this
process, two models collaborate to automatically label the images, in a way that one model compensates for the errors of the other, so avoiding error drift. While this automatic labeling process is done offline, the resulting pseudolabels can be used to train object detection models that must perform in real-time onboard a vehicle. We show that multi-modal co-training improves the labeling results compared to single-modal co-training, remaining competitive compared to human labeling.
Given the success of co-training in the context of object detection, we have also adapted this technique to a more crucial and challenging visual task, namely, onboard semantic segmentation. In fact, providing labels for a single image
can take from 30 to 90 minutes for a human labeler, depending on the content of the image. Thus, developing automatic labeling techniques for this visual task is of great interest to the automotive industry. In particular, the new co-training framework addresses synth-to-real UDA by an initial stage of self-training. Intermediate models arising from this stage are used to start the co-training procedure, for which we have elaborated an accurate collaboration policy between the two models performing the automatic labeling. Moreover, our co-training seamlessly leverages datasets from different synthetic domains. In addition, the co-training procedure is agnostic to the loss function used to train the semantic segmentation models which perform the automatic labeling. We achieve state-of-the-art results on publicly available benchmark datasets, again, remaining competitive compared to human labeling.
Finally, on the ground of our previous experience, we have designed and implemented a new SSL technique for UDA in the context of visual semantic segmentation. In this case, we mimic the labeling methodology followed by human labelers. In particular, rather than labeling full images at a time, categories of semantic classes are defined and only those are labeled in a labeling pass. In fact, different human labelers can become specialists in labeling different categories. Afterward, these per-category-labeled layers are combined to provide fully labeled images. Our technique is inspired by this methodology since we perform synth-to-real UDA per category, using the self-training stage previously developed as part of our co-training framework. The pseudo-labels obtained for each category are finally
fused to obtain fully automatically labeled images. In this context, we have also contributed to the development of a new photo-realistic synthetic dataset based on path-tracing rendering. Our new SSL technique seamlessly leverages publicly available synthetic datasets as well as this new one to obtain state-of-the-art results on synth-to-real UDA for semantic segmentation. We show that the new dataset allows us to reach better labeling accuracy than previously existing datasets, at the same time that it complements well them when combined. Moreover, we also show that the new human-inspired SSL technique outperforms co-training.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Antonio Lopez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Gom2023 Serial (down) 3961  
Permanent link to this record
 

 
Author Chenshen Wu edit  isbn
openurl 
  Title Going beyond Classification Problems for the Continual Learning of Deep Neural Networks Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Deep learning has made tremendous progress in the last decade due to the explosion of training data and computational power. Through end-to-end training on a
large dataset, image representations are more discriminative than the previously
used hand-crafted features. However, for many real-world applications, training
and testing on a single dataset is not realistic, as the test distribution may change over time. Continuous learning takes this situation into account, where the learner must adapt to a sequence of tasks, each with a different distribution. If you would naively continue training the model with a new task, the performance of the model would drop dramatically for the previously learned data. This phenomenon is known as catastrophic forgetting.
Many approaches have been proposed to address this problem, which can be divided into three main categories: regularization-based approaches, rehearsal-based
approaches, and parameter isolation-based approaches. However, most of the existing works focus on image classification tasks and many other computer vision tasks
have not been well-explored in the continual learning setting. Therefore, in this
thesis, we study continual learning for image generation, object re-identification,
and object counting.
For the image generation problem, since the model can generate images from the previously learned task, it is free to apply rehearsal without any limitation. We developed two methods based on generative replay. The first one uses the generated image for joint training together with the new data. The second one is based on
output pixel-wise alignment. We extensively evaluate these methods on several
benchmarks.
Next, we study continual learning for object Re-Identification (ReID). Although
most state-of-the-art methods of ReID and continual ReID use softmax-triplet loss,
we found that it is better to solve the ReID problem from a meta-learning perspective because continual learning of reID can benefit a lot from the generalization of metalearning. We also propose a distillation loss and found that the removal of the positive pairs before the distillation loss is critical.
Finally, we study continual learning for the counting problem. We study the mainstream method based on density maps and propose a new approach for density
map distillation. We found that fixing the counter head is crucial for the continual learning of object counting. To further improve results, we propose an adaptor to adapt the changing feature extractor for the fixed counter head. Extensive evaluation shows that this results in improved continual learning performance.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Bogdan Raducanu  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-126409-0-8 Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ Wu2023 Serial (down) 3960  
Permanent link to this record
 

 
Author Armin Mehri edit  isbn
openurl 
  Title Deep learning based architectures for cross-domain image processing Type Book Whole
  Year 2023 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Human vision is restricted to the visual-optical spectrum. Machine vision is not.
Cameras sensitive to diverse infrared spectral bands can improve the capacities of
autonomous systems and provide a comprehensive view. Relevant scene content
can be made visible, particularly in situations when sensors of other modalities,
such as a visual-optical camera, require a source of illumination. As a result, increasing the level of automation not only avoids human errors but also reduces
machine-induced errors. Furthermore, multi-spectral sensor systems with infrared
imagery as one modality are a rich source of information and can conceivably
increase the robustness of many autonomous systems. Robotics, automobiles,
biometrics, security, surveillance, and the military are some examples of fields
that can profit from the use of infrared imagery in their respective applications.
Although multimodal spectral sensors have come a long way, there are still several
bottlenecks that prevent us from combining their output information and using
them as comprehensive images. The primary issue with infrared imaging is the lack
of potential benefits due to their cost influence on sensor resolution, which grows
exponentially with greater resolution. Due to the more costly sensor technology
required for their development, their resolutions are substantially lower than thoseof regular digital cameras.
This thesis aims to improve beyond-visible-spectrum machine vision by integrating multi-modal spectral sensors. The emphasis is on transforming the produced images to enhance their resolution to match expected human perception, bring the color representation close to human understanding of natural color, and improve machine vision application performance. This research focuses mainly on two tasks, image Colorization and Image Super resolution for both single- and cross-domain problems. We first start with an extensive review of the state of the art in both tasks, point out the shortcomings of existing approaches, and then present our solutions to address their limitations. Our solutions demonstrate that low-cost channel information (i.e., visible image) can be used to improve expensive channel
information (i.e., infrared image), resulting in images with higher quality and closer to human perception at a lower cost than a high-cost infrared camera.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Angel Sappa  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-126409-1-5 Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ Meh2023 Serial (down) 3959  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: