|   | 
Details
   web
Records
Author Shiqi Yang; Yaxing Wang; Joost Van de Weijer; Luis Herranz
Title Unsupervised Domain Adaptation without Source Data by Casting a BAIT Type Miscellaneous
Year 2020 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract arXiv:2010.12427
Unsupervised domain adaptation (UDA) aims to transfer the knowledge learned from a labeled source domain to an unlabeled target domain. Existing UDA methods require access to source data during adaptation, which may not be feasible in some real-world applications. In this paper, we address the source-free unsupervised domain adaptation (SFUDA) problem, where only the source model is available during the adaptation. We propose a method named BAIT to address SFUDA. Specifically, given only the source model, with the source classifier head fixed, we introduce a new learnable classifier. When adapting to the target domain, class prototypes of the new added classifier will act as a bait. They will first approach the target features which deviate from prototypes of the source classifier due to domain shift. Then those target features are pulled towards the corresponding prototypes of the source classifier, thus achieving feature alignment with the source classifier in the absence of source data. Experimental results show that the proposed method achieves state-of-the-art performance on several benchmark datasets compared with existing UDA and SFUDA methods.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ YWW2020 Serial 3539
Permanent link to this record
 

 
Author Carola Figueroa Flores; Bogdan Raducanu; David Berga; Joost Van de Weijer
Title Hallucinating Saliency Maps for Fine-Grained Image Classification for Limited Data Domains Type Conference Article
Year 2021 Publication 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications Abbreviated Journal
Volume 4 Issue Pages 163-171
Keywords
Abstract arXiv:2007.12562
Most of the saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline, like for instance, image classification. In the current paper, we propose an approach which does not require explicit saliency maps to improve image classification, but they are learned implicitely, during the training of an end-to-end image classification task. We show that our approach obtains similar results as the case when the saliency maps are provided explicitely. Combining RGB data with saliency maps represents a significant advantage for object recognition, especially for the case when training data is limited. We validate our method on several datasets for fine-grained classification tasks (Flowers, Birds and Cars). In addition, we show that our saliency estimation method, which is trained without any saliency groundtruth data, obtains competitive results on real image saliency benchmark (Toronto), and outperforms deep saliency models with synthetic images (SID4VAM).
Address Virtual; February 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISAPP
Notes LAMP Approved no
Call Number Admin @ si @ FRB2021c Serial 3540
Permanent link to this record
 

 
Author Shiqi Yang; Kai Wang; Luis Herranz; Joost Van de Weijer
Title Simple and effective localized attribute representations for zero-shot learning Type Miscellaneous
Year 2020 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract arXiv:2006.05938
Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their semantic descriptions. Some recent papers have shown the importance of localized features together with fine-tuning the feature extractor to obtain discriminative and transferable features. However, these methods require complex attention or part detection modules to perform explicit localization in the visual space. In contrast, in this paper we propose localizing representations in the semantic/attribute space, with a simple but effective pipeline where localization is implicit. Focusing on attribute representations, we show that our method obtains state-of-the-art performance on CUB and SUN datasets, and also achieves competitive results on AWA2 dataset, outperforming generally more complex methods with explicit localization in the visual space. Our method can be implemented easily, which can be used as a new baseline for zero shot-learning. In addition, our localized representations are highly interpretable as attribute-specific heatmaps.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ YWH2020 Serial 3542
Permanent link to this record
 

 
Author Mikel Menta; Adriana Romero; Joost Van de Weijer
Title Learning to adapt class-specific features across domains for semantic segmentation Type Miscellaneous
Year 2020 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract arXiv:2001.08311
Recent advances in unsupervised domain adaptation have shown the effectiveness of adversarial training to adapt features across domains, endowing neural networks with the capability of being tested on a target domain without requiring any training annotations in this domain. The great majority of existing domain adaptation models rely on image translation networks, which often contain a huge amount of domain-specific parameters. Additionally, the feature adaptation step often happens globally, at a coarse level, hindering its applicability to tasks such as semantic segmentation, where details are of crucial importance to provide sharp results. In this thesis, we present a novel architecture, which learns to adapt features across domains by taking into account per class information. To that aim, we design a conditional pixel-wise discriminator network, whose output is conditioned on the segmentation masks. Moreover, following recent advances in image translation, we adopt the recently introduced StarGAN architecture as image translation backbone, since it is able to perform translations across multiple domains by means of a single generator network. Preliminary results on a segmentation task designed to assess the effectiveness of the proposed approach highlight the potential of the model, improving upon strong baselines and alternative designs.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ MRW2020 Serial 3545
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz
Title Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications Type Book Whole
Year 2020 Publication Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications; VISIGRAPP 2020 Abbreviated Journal
Volume 4 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ FRB2020a Serial 3546
Permanent link to this record
 

 
Author Giovanni Maria Farinella; Petia Radeva; Jose Braz
Title Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications Type Book Whole
Year 2020 Publication Proceedings of the 15th International Joint Conference on Computer Vision; Imaging and Computer Graphics Theory and Applications; VISIGRAPP 2020 Abbreviated Journal
Volume 5 Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ FRB2020b Serial 3547
Permanent link to this record
 

 
Author Idoia Ruiz; Lorenzo Porzi; Samuel Rota Bulo; Peter Kontschieder; Joan Serrat
Title Weakly Supervised Multi-Object Tracking and Segmentation Type Conference Article
Year 2021 Publication IEEE Winter Conference on Applications of Computer Vision Workshops Abbreviated Journal
Volume Issue Pages 125-133
Keywords
Abstract We introduce the problem of weakly supervised MultiObject Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation.
To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by
Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the
objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7 % for cars and pedestrians, respectively.
Address Virtual; January 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WACVW
Notes ADAS; 600.118; 600.124 Approved no
Call Number Admin @ si @ RPR2021 Serial 3548
Permanent link to this record
 

 
Author Guillem Cucurull; Pau Rodriguez; Vacit Oguz Yazici; Josep M. Gonfaus; Xavier Roca; Jordi Gonzalez
Title Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks Type Miscellaneous
Year 2018 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract arXiv:1802.06757
Social media, as a major platform for communication and information exchange, is a rich repository of the opinions and sentiments of 2.3 billion users about a vast spectrum of topics. To sense the whys of certain social user’s demands and cultural-driven interests, however, the knowledge embedded in the 1.8 billion pictures which are uploaded daily in public profiles has just started to be exploited since this process has been typically been text-based. Following this trend on visual-based social analysis, we present a novel methodology based on Deep Learning to build a combined image-and-text based personality trait model, trained with images posted together with words found highly correlated to specific personality traits. So the key contribution here is to explore whether OCEAN personality trait modeling can be addressed based on images, here called MindPics, appearing with certain tags with psychological insights. We found that there is a correlation between those posted images and their accompanying texts, which can be successfully modeled using deep neural networks for personality estimation. The experimental results are consistent with previous cyber-psychology results based on texts or images.
In addition, classification results on some traits show that some patterns emerge in the set of images corresponding to a specific text, in essence to those representing an abstract concept. These results open new avenues of research for further refining the proposed personality model under the supervision of psychology experts.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ISE; 600.098; 600.119 Approved no
Call Number Admin @ si @ CRY2018 Serial 3550
Permanent link to this record
 

 
Author Pau Rodriguez; Jordi Gonzalez; Josep M. Gonfaus; Xavier Roca
Title Towards Visual Personality Questionnaires based on Deep Learning and Social Media Type Conference Article
Year 2019 Publication 21st International Conference on Social Influence and Social Psychology Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address April 2019; Tokio; Japan
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICSISP
Notes ISE; 600.119 Approved no
Call Number Admin @ si @ RGG2020 Serial 3554
Permanent link to this record
 

 
Author Ozge Mercanoglu Sincan; Julio C. S. Jacques Junior; Sergio Escalera; Hacer Yalim Keles
Title ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research Type Conference Article
Year 2021 Publication Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal
Volume Issue Pages 3467-3476
Keywords
Abstract The performances of Sign Language Recognition (SLR) systems have improved considerably in recent years. However, several open challenges still need to be solved to allow SLR to be useful in practice. The research in the field is in its infancy in regards to the robustness of the models to a large diversity of signs and signers, and to fairness of the models to performers from different demographics. This work summarises the ChaLearn LAP Large Scale Signer Independent Isolated SLR Challenge, organised at CVPR 2021 with the goal of overcoming some of the aforementioned challenges. We analyse and discuss the challenge design, top winning solutions and suggestions for future research. The challenge attracted 132 participants in the RGB track and 59 in the RGB+Depth track, receiving more than 1.5K submissions in total. Participants were evaluated using a new large-scale multi-modal Turkish Sign Language (AUTSL) dataset, consisting of 226 sign labels and 36,302 isolated sign video samples performed by 43 different signers. Winning teams achieved more than 96% recognition rate, and their approaches benefited from pose/hand/face estimation, transfer learning, external data, fusion/ensemble of modalities and different strategies to model spatio-temporal information. However, methods still fail to distinguish among very similar signs, in particular those sharing similar hand trajectories.
Address Virtual; June 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes HuPBA; no proj Approved no
Call Number Admin @ si @ MJE2021 Serial 3560
Permanent link to this record
 

 
Author Albin Soutif; Marc Masana; Joost Van de Weijer; Bartlomiej Twardowski
Title On the importance of cross-task features for class-incremental learning Type Conference Article
Year 2021 Publication Theory and Foundation of continual learning workshop of ICML Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In class-incremental learning, an agent with limited resources needs to learn a sequence of classification tasks, forming an ever growing classification problem, with the constraint of not being able to access data from previous tasks. The main difference with task-incremental learning, where a task-ID is available at inference time, is that the learner also needs to perform crosstask discrimination, i.e. distinguish between classes that have not been seen together. Approaches to tackle this problem are numerous and mostly make use of an external memory (buffer) of non-negligible size. In this paper, we ablate the learning of crosstask features and study its influence on the performance of basic replay strategies used for class-IL. We also define a new forgetting measure for class-incremental learning, and see that forgetting is not the principal cause of low performance. Our experimental results show that future algorithms for class-incremental learning should not only prevent forgetting, but also aim to improve the quality of the cross-task features. This is especially important when the number of classes per task is small.
Address Virtual; July 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICMLW
Notes LAMP Approved no
Call Number Admin @ si @ SMW2021 Serial 3588
Permanent link to this record
 

 
Author Guillermo Torres; Debora Gil
Title A multi-shape loss function with adaptive class balancing for the segmentation of lung structures Type Journal Article
Year 2020 Publication International Journal of Computer Assisted Radiology and Surgery Abbreviated Journal IJCAR
Volume 15 Issue 1 Pages S154-55
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM Approved no
Call Number Admin @ si @ ToG2020 Serial 3590
Permanent link to this record
 

 
Author Hassan Ahmed Sial
Title Estimating Light Effects from a Single Image: Deep Architectures and Ground-Truth Generation Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In this thesis, we explore how to estimate the effects of the light interacting with the scene objects from a single image. To achieve this goal, we focus on recovering intrinsic components like reflectance, shading, or light properties such as color and position using deep architectures. The success of these approaches relies on training on large and diversified image datasets. Therefore, we present several contributions on this such as: (a) a data-augmentation technique; (b) a ground-truth for an existing multi-illuminant dataset; (c) a family of synthetic datasets, SID for Surreal Intrinsic Datasets, with diversified backgrounds and coherent light conditions; and (d) a practical pipeline to create hybrid ground-truths to overcome the complexity of acquiring realistic light conditions in a massive way. In parallel with the creation of datasets, we trained different flexible encoder-decoder deep architectures incorporating physical constraints from the image formation models.

In the last part of the thesis, we apply all the previous experience to two different problems. Firstly, we create a large hybrid Doc3DShade dataset with real shading and synthetic reflectance under complex illumination conditions, that is used to train a two-stage architecture that improves the character recognition task in complex lighting conditions of unwrapped documents. Secondly, we tackle the problem of single image scene relighting by extending both, the SID dataset to present stronger shading and shadows effects, and the deep architectures to use intrinsic components to estimate new relit images.
Address September 2021
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Maria Vanrell;Ramon Baldrich
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-8-5 Medium
Area Expedition Conference
Notes CIC; Approved no
Call Number Admin @ si @ Sia2021 Serial 3607
Permanent link to this record
 

 
Author Fei Yang
Title Towards Practical Neural Image Compression Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Images and videos are pervasive in our life and communication. With advances in smart and portable devices, high capacity communication networks and high definition cinema, image and video compression are more relevant than ever. Traditional block-based linear transform codecs such as JPEG, H.264/AVC or the recent H.266/VVC are carefully designed to meet not only the rate-distortion criteria, but also the practical requirements of applications.
Recently, a new paradigm based on deep neural networks (i.e., neural image/video compression) has become increasingly popular due to its ability to learn powerful nonlinear transforms and other coding tools directly from data instead of being crafted by humans, as was usual in previous coding formats. While achieving excellent rate-distortion performance, these approaches are still limited mostly to research environments due to heavy models and other practical limitations, such as being limited to function on a particular rate and due to high memory and computational cost. In this thesis, we study these practical limitations, and designing more practical neural image compression approaches.
After analyzing the differences between traditional and neural image compression, our first contribution is the modulated autoencoder (MAE), a framework that includes a mechanism to provide multiple rate-distortion options within a single model with comparable performance to independent models. In a second contribution, we propose the slimmable compressive autoencoder (SlimCAE), which in addition to variable rate, can optimize the complexity of the model and thus reduce significantly the memory and computational burden.
Modern generative models can learn custom image transformation directly from suitable datasets following encoder-decoder architectures, task known as image-to-image (I2I) translation. Building on our previous work, we study the problem of distributed I2I translation, where the latent representation is transmitted through a binary channel and decoded in a remote receiving side. We also propose a variant that can perform both translation and the usual autoencoding functionality.
Finally, we also consider neural video compression, where the autoencoder is typically augmented with temporal prediction via motion compensation. One of the main bottlenecks of that framework is the optical flow module that estimates the displacement to predict the next frame. Focusing on this module, we propose a method that improves the accuracy of the optical flow estimation and a simplified variant that reduces the computational cost.
Key words: neural image compression, neural video compression, optical flow, practical neural image compression, compressive autoencoders, image-to-image translation, deep learning.
Address December 2021
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Luis Herranz;Mikhail Mozerov;Yongmei Cheng
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-7-8 Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ Yan2021 Serial 3608
Permanent link to this record
 

 
Author Javad Zolfaghari Bengar
Title Reducing Label Effort with Deep Active Learning Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition applications, such as image classification, detection and segmentation. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Active learning is a paradigm aimed at reducing the annotation effort by training the model on actively selected
informative and/or representative samples. In this thesis we study several aspects of active learning including video object detection for autonomous driving systems, image classification on balanced and imbalanced datasets and the incorporation of self-supervised learning in active learning. We briefly describe our approach in each of these areas to reduce the labeling effort.
In chapter two we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our criterion is based on the estimated number of errors in terms of false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active
learning for video object detection in road scenes. Finally, we show that our
approach outperforms active learning baselines tested on two outdoor datasets.
In the next chapter we address the well-known problem of over confidence in the neural networks. As an alternative to network confidence, we propose a new informativeness-based active learning method that captures the learning dynamics of neural network with a metric called label-dispersion. This metric is low when the network consistently assigns the same label to the sample during the course of training and high when the assigned label changes frequently. We show that label-dispersion is a promising predictor of the uncertainty of the network, and show on two benchmark datasets that an active learning algorithm based on label-dispersion obtains excellent results.
In chapter four, we tackle the problem of sampling bias in active learning methods on imbalanced datasets. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called longtail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we propose a general optimization framework that explicitly takes class-balancing into account. Results on three datasets show that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we show that also on balanced datasets our method generally results in a performance gain.
Another paradigm to reduce the annotation effort is self-training that learns from a large amount of unlabeled data in an unsupervised way and fine-tunes on few labeled samples. Recent advancements in self-training have achieved very impressive results rivaling supervised learning on some datasets. In the last chapter we focus on whether active learning and self supervised learning can benefit from each other.
We study object recognition datasets with several labeling budgets for the evaluations. Our experiments reveal that self-training is remarkably more efficient than active learning at reducing the labeling effort, that for a low labeling budget, active learning offers no benefit to self-training, and finally that the combination of active learning and self-training is fruitful when the labeling budget is high.
Address December 2021
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Bogdan Raducanu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-9-2 Medium
Area Expedition Conference
Notes LAMP; Approved no
Call Number Admin @ si @ Zol2021 Serial 3609
Permanent link to this record