|   | 
Details
   web
Records
Author Ana Garcia Rodriguez; Yael Tudela; Henry Cordova; S. Carballal; I. Ordas; L. Moreira; E. Vaquero; O. Ortiz; L. Rivero; F. Javier Sanchez; Miriam Cuatrecasas; Maria Pellise; Jorge Bernal; Gloria Fernandez Esparrach
Title First in Vivo Computer-Aided Diagnosis of Colorectal Polyps using White Light Endoscopy Type Journal Article
Year 2022 Publication Endoscopy Abbreviated Journal END
Volume 54 Issue Pages
Keywords
Abstract
Address 2022/04/14
Corporate Author Thesis
Publisher Georg Thieme Verlag KG Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) ISE Approved no
Call Number Admin @ si @ GTC2022a Serial 3746
Permanent link to this record
 

 
Author Diego Velazquez; Pau Rodriguez; Josep M. Gonfaus; Xavier Roca; Jordi Gonzalez
Title A Closer Look at Embedding Propagation for Manifold Smoothing Type Journal Article
Year 2022 Publication Journal of Machine Learning Research Abbreviated Journal JMLR
Volume 23 Issue 252 Pages 1-27
Keywords Regularization; emi-supervised learning; self-supervised learning; adversarial robustness; few-shot classification
Abstract Supervised training of neural networks requires a large amount of manually annotated data and the resulting networks tend to be sensitive to out-of-distribution (OOD) data.
Self- and semi-supervised training schemes reduce the amount of annotated data required during the training process. However, OOD generalization remains a major challenge for most methods. Strategies that promote smoother decision boundaries play an important role in out-of-distribution generalization. For example, embedding propagation (EP) for manifold smoothing has recently shown to considerably improve the OOD performance for few-shot classification. EP achieves smoother class manifolds by building a graph from sample embeddings and propagating information through the nodes in an unsupervised manner. In this work, we extend the original EP paper providing additional evidence and experiments showing that it attains smoother class embedding manifolds and improves results in settings beyond few-shot classification. Concretely, we show that EP improves the robustness of neural networks against multiple adversarial attacks as well as semi- and
self-supervised learning performance.
Address 9/2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) ISE Approved no
Call Number Admin @ si @ VRG2022 Serial 3762
Permanent link to this record
 

 
Author Y. Mori; M.Misawa; Jorge Bernal; M. Bretthauer; S.Kudo; A. Rastogi; Gloria Fernandez Esparrach
Title Artificial Intelligence for Disease Diagnosis-the Gold Standard Challenge Type Journal Article
Year 2022 Publication Gastrointestinal Endoscopy Abbreviated Journal
Volume 96 Issue 2 Pages 370-372
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) ISE Approved no
Call Number Admin @ si @ MMB2022 Serial 3701
Permanent link to this record
 

 
Author Parichehr Behjati Ardakani
Title Towards Efficient and Robust Convolutional Neural Networks for Single Image Super-Resolution Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Single image super-resolution (SISR) is an important task in image processing which aims to enhance the resolution of imaging systems. Recently, SISR has witnessed great strides with the rapid development of deep learning. Recent advances in SISR are mostly devoted to designing deeper and wider networks to enhance their representation learning capacity. However, as the depth of networks increases, deep learning-based methods are faced with the challenge of computational complexity in practice. Moreover, most existing methods rarely leverage the intermediate features and also do not discriminate the computation of features by their frequencial components, thereby achieving relatively low performance. Aside from the aforementioned problems, another desired ability is to upsample images to arbitrary scales using a single model. Most current SISR methods train a dedicated model for each target resolution, losing generality and increasing memory requirements. In this thesis, we address the aforementioned issues and propose solutions to them: i) We present a novel frequency-based enhancement block which treats different frequencies in a heterogeneous way and also models inter-channel dependencies, which consequently enrich the output feature. Thus it helps the network generate more discriminative representations by explicitly recovering finer details. ii) We introduce OverNet which contains two main parts: a lightweight feature extractor that follows a novel recursive framework of skip and dense connections to reduce low-level feature degradation, and an overscaling module that generates an accurate SR image by internally constructing an overscaled intermediate representation of the output features. Then, to solve the problem of reconstruction at arbitrary scale factors, we introduce a novel multi-scale loss, that allows the simultaneous training of all scale factors using a single model. iii) We propose a directional variance attention network which leverages a novel attention mechanism to enhance features in different channels and spatial regions. Moreover, we introduce a novel procedure for using attention mechanisms together with residual blocks to facilitate the preservation of finer details. Finally, we demonstrate that our approaches achieve considerably better performance than previous state-of-the-art methods, in terms of both quantitative and visual quality.
Address April, 2022
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Jordi Gonzalez;Xavier Roca;Pau Rodriguez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-124793-1-7 Medium
Area Expedition Conference
Notes (up) ISE Approved no
Call Number Admin @ si @ Beh2022 Serial 3713
Permanent link to this record
 

 
Author Wenjuan Gong; Zhang Yue; Wei Wang; Cheng Peng; Jordi Gonzalez
Title Meta-MMFNet: Meta-Learning Based Multi-Model Fusion Network for Micro-Expression Recognition Type Journal Article
Year 2022 Publication ACM Transactions on Multimedia Computing, Communications, and Applications Abbreviated Journal ACMTMC
Volume Issue Pages
Keywords Feature Fusion; Model Fusion; Meta-Learning; Micro-Expression Recognition
Abstract Despite its wide applications in criminal investigations and clinical communications with patients suffering from autism, automatic micro-expression recognition remains a challenging problem because of the lack of training data and imbalanced classes problems. In this study, we proposed a meta-learning based multi-model fusion network (Meta-MMFNet) to solve the existing problems. The proposed method is based on the metric-based meta-learning pipeline, which is specifically designed for few-shot learning and is suitable for model-level fusion. The frame difference and optical flow features were fused, deep features were extracted from the fused feature, and finally in the meta-learning-based framework, weighted sum model fusion method was applied for micro-expression classification. Meta-MMFNet achieved better results than state-of-the-art methods on four datasets. The code is available at https://github.com/wenjgong/meta-fusion-based-method.
Address May 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) ISE; 600.157 Approved no
Call Number Admin @ si @ GYW2022 Serial 3692
Permanent link to this record
 

 
Author Ana Garcia Rodriguez; Yael Tudela; Henry Cordova; S. Carballal; I. Ordas; L. Moreira; E. Vaquero; O. Ortiz; L. Rivero; F. Javier Sanchez; Miriam Cuatrecasas; Maria Pellise; Jorge Bernal; Gloria Fernandez Esparrach
Title In vivo computer-aided diagnosis of colorectal polyps using white light endoscopy Type Journal Article
Year 2022 Publication Endoscopy International Open Abbreviated Journal ENDIO
Volume 10 Issue 9 Pages E1201-E1207
Keywords
Abstract Background and study aims Artificial intelligence is currently able to accurately predict the histology of colorectal polyps. However, systems developed to date use complex optical technologies and have not been tested in vivo. The objective of this study was to evaluate the efficacy of a new deep learning-based optical diagnosis system, ATENEA, in a real clinical setting using only high-definition white light endoscopy (WLE) and to compare its performance with endoscopists. Methods ATENEA was prospectively tested in real life on consecutive polyps detected in colorectal cancer screening colonoscopies at Hospital Clínic. No images were discarded, and only WLE was used. The in vivo ATENEA's prediction (adenoma vs non-adenoma) was compared with the prediction of four staff endoscopists without specific training in optical diagnosis for the study purposes. Endoscopists were blind to the ATENEA output. Histology was the gold standard. Results Ninety polyps (median size: 5 mm, range: 2-25) from 31 patients were included of which 69 (76.7 %) were adenomas. ATENEA correctly predicted the histology in 63 of 69 (91.3 %, 95 % CI: 82 %-97 %) adenomas and 12 of 21 (57.1 %, 95 % CI: 34 %-78 %) non-adenomas while endoscopists made correct predictions in 52 of 69 (75.4 %, 95 % CI: 60 %-85 %) and 20 of 21 (95.2 %, 95 % CI: 76 %-100 %), respectively. The global accuracy was 83.3 % (95 % CI: 74%-90 %) and 80 % (95 % CI: 70 %-88 %) for ATENEA and endoscopists, respectively. Conclusion ATENEA can accurately be used for in vivo characterization of colorectal polyps, enabling the endoscopist to make direct decisions. ATENEA showed a global accuracy similar to that of endoscopists despite an unsatisfactory performance for non-adenomatous lesions.
Address 2022 Sep 14
Corporate Author Thesis
Publisher PMID Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) ISE; 600.157 Approved no
Call Number Admin @ si @ GTC2022b Serial 3752
Permanent link to this record
 

 
Author Yecong Wan; Yuanshuo Cheng; Miingwen Shao; Jordi Gonzalez
Title Image rain removal and illumination enhancement done in one go Type Journal Article
Year 2022 Publication Knowledge-Based Systems Abbreviated Journal KBS
Volume 252 Issue Pages 109244
Keywords
Abstract Rain removal plays an important role in the restoration of degraded images. Recently, CNN-based methods have achieved remarkable success. However, these approaches neglect that the appearance of real-world rain is often accompanied by low light conditions, which will further degrade the image quality, thereby hindering the restoration mission. Therefore, it is very indispensable to jointly remove the rain and enhance illumination for real-world rain image restoration. To this end, we proposed a novel spatially-adaptive network, dubbed SANet, which can remove the rain and enhance illumination in one go with the guidance of degradation mask. Meanwhile, to fully utilize negative samples, a contrastive loss is proposed to preserve more natural textures and consistent illumination. In addition, we present a new synthetic dataset, named DarkRain, to boost the development of rain image restoration algorithms in practical scenarios. DarkRain not only contains different degrees of rain, but also considers different lighting conditions, and more realistically simulates real-world rainfall scenarios. SANet is extensively evaluated on the proposed dataset and attains new state-of-the-art performance against other combining methods. Moreover, after a simple transformation, our SANet surpasses existing the state-of-the-art algorithms in both rain removal and low-light image enhancement.
Address Sept 2022
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) ISE; 600.157; 600.168 Approved no
Call Number Admin @ si @ WCS2022 Serial 3744
Permanent link to this record
 

 
Author Vacit Oguz Yazici
Title Towards Smart Fashion: Visual Recognition of Products and Attributes Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Artificial intelligence is innovating the fashion industry by proposing new applications and solutions to the problems encountered by researchers and engineers working in the industry. In this thesis, we address three of these problems. In the first part of the thesis, we tackle the problem of multi-label image classification which is very related to fashion attribute recognition. In the second part of the thesis, we address two problems that are specific to fashion. Firstly, we address the problem of main product detection which is the task of associating correct image parts (e.g. bounding boxes) with the fashion product being sold. Secondly, we address the problem of color naming for multicolored fashion items. The task of multi-label image classification consists in assigning various concepts such as objects or attributes to images. Usually, there are dependencies that can be learned between the concepts to capture label correlations (chair and table classes are more likely to co-exist than chair and giraffe).
If we treat the multi-label image classification problem as an orderless set prediction problem, we can exploit recurrent neural networks (RNN) to capture label correlations. However, RNNs are trained to predict ordered sequences of tokens, so if the order of the predicted sequence is different than the order of the ground truth sequence, there will be penalization although the predictions are correct. Therefore, in the first part of the thesis, we propose an orderless loss function which will order the labels in the ground truth sequence dynamically in a way that the minimum loss is achieved. This results in a significant improvement of RNN models on multi-label image classification over the previous methods.
However, RNNs suffer from long term dependencies when the cardinality of set grows bigger. The decoding process might stop early if the current hidden state cannot find any object and outputs the termination token. This would cause the remaining classes not to be predicted and lower recall metric. Transformers can be used to avoid the long term dependency problem exploiting their selfattention modules that process sequential data simultaneously. Consequently, we propose a novel transformer model for multi-label image classification which surpasses the state-of-the-art results by a large margin.
In the second part of thesis, we focus on two fashion-specific problems. Main product detection is the task of associating image parts with the fashion product that is being sold, generally using associated textual metadata (product title or description). Normally, in fashion e-commerces, products are represented by multiple images where a person wears the product along with other fashion items. If all the fashion items in the images are marked with bounding boxes, we can use the textual metadata to decide which item is the main product. The initial work treated each of these images independently, discarding the fact that they all belong to the same product. In this thesis, we represent the bounding boxes from all the images as nodes in a fully connected graph. This allows the algorithm to learn relations between the nodes during training and take the entire context into account for the final decision. Our algorithm results in a significant improvement of the state-ofthe-art.
Moreover, we address the problem of color naming for multicolored fashion items, which is a challenging task due to the external factors such as illumination changes or objects that act as clutter. In the context of multi-label classification, the vaguely defined lines between the classes in the color space cause ambiguity. For example, a shade of blue which is very close to green might cause the model to incorrectly predict the color blue and green at the same time. Based on this, models trained for color naming are expected to recognize the colors and their quantities in both single colored and multicolored fashion items. Therefore, in this thesis, we propose a novel architecture with an additional head that explicitly estimates the number of colors in fashion items. This removes the ambiguity problem and results in better color naming performance.
Address January 2022
Corporate Author Thesis Ph.D. thesis
Publisher IMPRIMA Place of Publication Editor Joost Van de Weijer;Arnau Ramisa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-6-1 Medium
Area Expedition Conference
Notes (up) LAMP Approved no
Call Number Admin @ si @ Ogu2022 Serial 3631
Permanent link to this record
 

 
Author Kai Wang
Title Continual learning for hierarchical classification, few-shot recognition, and multi-modal learning Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Deep learning has drastically changed computer vision in the past decades and achieved great success in many applications, such as image classification, retrieval, detection, and segmentation thanks to the emergence of neural networks. Typically, for most applications, these networks are presented with examples from all tasks they are expected to perform. However, for many applications, this is not a realistic
scenario, and an algorithm is required to learn tasks sequentially. Continual learning proposes theory and methods for this scenario.
The main challenge for continual learning systems is called catastrophic forgetting and refers to a significant drop in performance on previous tasks. To tackle this problem, three main branches of methods have been explored to alleviate the forgetting in continual learning. They are regularization-based methods, rehearsalbased methods, and parameter isolation-based methods. However, most of them are focused on image classification tasks. Continual learning of many computer vision fields has still not been well-explored. Thus, in this thesis, we extend the continual learning knowledge to meta learning, we propose a method for the incremental learning of hierarchical relations for image classification, we explore image recognition in online continual learning, and study continual learning for cross-modal learning.
In this thesis, we explore the usage of image rehearsal when addressing the incremental meta learning problem. Observing that existingmethods fail to improve performance with saved exemplars, we propose to mix exemplars with current task data and episode-level distillation to overcome forgetting in incremental meta learning. Next, we study a more realistic image classification scenario where each class has multiple granularity levels. Only one label is present at any time, which requires the model to infer if the provided label has a hierarchical relation with any already known label. In experiments, we show that the estimated hierarchy information can be beneficial in both the training and inference stage.
For the online continual learning setting, we investigate the usage of intermediate feature replay. In this case, the training samples are only observed by the model only one time. Here we fix thememory buffer for feature replay and compare the effectiveness of saving features from different layers. Finally, we investigate multi-modal continual learning, where an image encoder is cooperating with a semantic branch. We consider the continual learning of both zero-shot learning and cross-modal retrieval problems.
Address July, 2022
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Luis Herranz;Joost Van de Weijer
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-124793-2-4 Medium
Area Expedition Conference
Notes (up) LAMP Approved no
Call Number Admin @ si @ Wan2022 Serial 3714
Permanent link to this record
 

 
Author Aitor Alvarez-Gila
Title Self-supervised learning for image-to-image translation in the small data regime Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords Computer vision; Neural networks; Self-supervised learning; Image-to-image mapping; Probabilistic programming
Abstract The mass irruption of Deep Convolutional Neural Networks (CNNs) in computer vision since 2012 led to a dominance of the image understanding paradigm consisting in an end-to-end fully supervised learning workflow over large-scale annotated datasets. This approach proved to be extremely useful at solving a myriad of classic and new computer vision tasks with unprecedented performance —often, surpassing that of humans—, at the expense of vast amounts of human-labeled data, extensive computational resources and the disposal of all of our prior knowledge on the task at hand. Even though simple transfer learning methods, such as fine-tuning, have achieved remarkable impact, their success when the amount of labeled data in the target domain is small is limited. Furthermore, the non-static nature of data generation sources will often derive in data distribution shifts that degrade the performance of deployed models. As a consequence, there is a growing demand for methods that can exploit elements of prior knowledge and sources of information other than the manually generated ground truth annotations of the images during the network training process, so that they can adapt to new domains that constitute, if not a small data regime, at least a small labeled data regime. This thesis targets such few or no labeled data scenario in three distinct image-to-image mapping learning problems. It contributes with various approaches that leverage our previous knowledge of different elements of the image formation process: We first present a data-efficient framework for both defocus and motion blur detection, based on a model able to produce realistic synthetic local degradations. The framework comprises a self-supervised, a weakly-supervised and a semi-supervised instantiation, depending on the absence or availability and the nature of human annotations, and outperforms fully-supervised counterparts in a variety of settings. Our knowledge on color image formation is then used to gather input and target ground truth image pairs for the RGB to hyperspectral image reconstruction task. We make use of a CNN to tackle this problem, which, for the first time, allows us to exploit spatial context and achieve state-of-the-art results given a limited hyperspectral image set. In our last contribution to the subfield of data-efficient image-to-image transformation problems, we present the novel semi-supervised task of zero-pair cross-view semantic segmentation: we consider the case of relocation of the camera in an end-to-end trained and deployed monocular, fixed-view semantic segmentation system often found in industry. Under the assumption that we are allowed to obtain an additional set of synchronized but unlabeled image pairs of new scenes from both original and new camera poses, we present ZPCVNet, a model and training procedure that enables the production of dense semantic predictions in either source or target views at inference time. The lack of existing suitable public datasets to develop this approach led us to the creation of MVMO, a large-scale Multi-View, Multi-Object path-traced dataset with per-view semantic segmentation annotations. We expect MVMO to propel future research in the exciting under-developed fields of cross-view and multi-view semantic segmentation. Last, in a piece of applied research of direct application in the context of process monitoring of an Electric Arc Furnace (EAF) in a steelmaking plant, we also consider the problem of simultaneously estimating the temperature and spectral emissivity of distant hot emissive samples. To that end, we design our own capturing device, which integrates three point spectrometers covering a wide range of the Ultra-Violet, visible, and Infra-Red spectra and is capable of registering the radiance signal incoming from an 8cm diameter spot located up to 20m away. We then define a physically accurate radiative transfer model that comprises the effects of atmospheric absorbance, of the optical system transfer function, and of the sample temperature and spectral emissivity themselves. We solve this inverse problem without the need for annotated data using a probabilistic programming-based Bayesian approach, which yields full posterior distribution estimates of the involved variables that are consistent with laboratory-grade measurements.
Address Julu, 2019
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Joost Van de Weijer; Estibaliz Garrote
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP Approved no
Call Number Admin @ si @ Alv2022 Serial 3716
Permanent link to this record
 

 
Author Alex Gomez-Villa; Bartlomiej Twardowski; Lu Yu; Andrew Bagdanov; Joost Van de Weijer
Title Continually Learning Self-Supervised Representations With Projected Functional Regularization Type Conference Article
Year 2022 Publication CVPR 2022 Workshop on Continual Learning (CLVision, 3rd Edition) Abbreviated Journal
Volume Issue Pages 3866-3876
Keywords Computer vision; Conferences; Self-supervised learning; Image representation; Pattern recognition
Abstract Recent self-supervised learning methods are able to learn high-quality image representations and are closing the gap with supervised approaches. However, these methods are unable to acquire new knowledge incrementally – they are, in fact, mostly used only as a pre-training phase over IID data. In this work we investigate self-supervised methods in continual learning regimes without any replay
mechanism. We show that naive functional regularization,also known as feature distillation, leads to lower plasticity and limits continual learning performance. Instead, we propose Projected Functional Regularization in which a separate temporal projection network ensures that the newly learned feature space preserves information of the previous one, while at the same time allowing for the learning of new features. This prevents forgetting while maintaining the plasticity of the learner. Comparison with other incremental learning approaches applied to self-supervision demonstrates that our method obtains competitive performance in
different scenarios and on multiple datasets.
Address New Orleans, USA; 20 June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes (up) LAMP: 600.147; 600.120;CIC Approved no
Call Number Admin @ si @ GTY2022 Serial 3704
Permanent link to this record
 

 
Author Marc Masana; Xialei Liu; Bartlomiej Twardowski; Mikel Menta; Andrew Bagdanov; Joost Van de Weijer
Title Class-incremental learning: survey and performance evaluation Type Journal Article
Year 2022 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI
Volume Issue Pages
Keywords
Abstract For future learning systems incremental learning is desirable, because it allows for: efficient resource usage by eliminating the need to retrain from scratch at the arrival of new data; reduced memory usage by preventing or limiting the amount of data required to be stored -- also important when privacy limitations are imposed; and learning that more closely resembles human learning. The main challenge for incremental learning is catastrophic forgetting, which refers to the precipitous drop in performance on previously learned tasks after learning a new one. Incremental learning of deep neural networks has seen explosive growth in recent years. Initial work focused on task incremental learning, where a task-ID is provided at inference time. Recently we have seen a shift towards class-incremental learning where the learner must classify at inference time between all classes seen in previous tasks without recourse to a task-ID. In this paper, we provide a complete survey of existing methods for incremental learning, and in particular we perform an extensive experimental evaluation on twelve class-incremental methods. We consider several new experimental scenarios, including a comparison of class-incremental methods on multiple large-scale datasets, investigation into small and large domain shifts, and comparison on various network architectures.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP; 600.120;CIC Approved no
Call Number Admin @ si @ MLT2022 Serial 3538
Permanent link to this record
 

 
Author Vacit Oguz Yazici; Joost Van de Weijer; Longlong Yu
Title Visual Transformers with Primal Object Queries for Multi-Label Image Classification Type Conference Article
Year 2022 Publication 26th International Conference on Pattern Recognition Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Multi-label image classification is about predicting a set of class labels that can be considered as orderless sequential data. Transformers process the sequential data as a whole, therefore they are inherently good at set prediction. The first vision-based transformer model, which was proposed for the object detection task introduced the concept of object queries. Object queries are learnable positional encodings that are used by attention modules in decoder layers to decode the object classes or bounding boxes using the region of interests in an image. However, inputting the same set of object queries to different decoder layers hinders the training: it results in lower performance and delays convergence. In this paper, we propose the usage of primal object queries that are only provided at the start of the transformer decoder stack. In addition, we improve the mixup technique proposed for multi-label classification. The proposed transformer model with primal object queries improves the state-of-the-art class wise F1 metric by 2.1% and 1.8%; and speeds up the convergence by 79.0% and 38.6% on MS-COCO and NUS-WIDE datasets respectively.
Address Montreal; Quebec; Canada; August 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes (up) LAMP; 600.147; 601.309;CIC Approved no
Call Number Admin @ si @ YWY2022 Serial 3786
Permanent link to this record
 

 
Author Hector Laria Mantecon; Yaxing Wang; Joost Van de Weijer; Bogdan Raducanu
Title Transferring Unconditional to Conditional GANs With Hyper-Modulation Type Conference Article
Year 2022 Publication IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Abbreviated Journal
Volume Issue Pages
Keywords
Abstract GANs have matured in recent years and are able to generate high-resolution, realistic images. However, the computational resources and the data required for the training of high-quality GANs are enormous, and the study of transfer learning of these models is therefore an urgent topic. Many of the available high-quality pretrained GANs are unconditional (like StyleGAN). For many applications, however, conditional GANs are preferable, because they provide more control over the generation process, despite often suffering more training difficulties. Therefore, in this paper, we focus on transferring from high-quality pretrained unconditional GANs to conditional GANs. This requires architectural adaptation of the pretrained GAN to perform the conditioning. To this end, we propose hyper-modulated generative networks that allow for shared and complementary supervision. To prevent the additional weights of the hypernetwork to overfit, with subsequent mode collapse on small target domains, we introduce a self-initialization procedure that does not require any real data to initialize the hypernetwork parameters. To further improve the sample efficiency of the transfer, we apply contrastive learning in the discriminator, which effectively works on very limited batch sizes. In extensive experiments, we validate the efficiency of the hypernetworks, self-initialization and contrastive loss for knowledge transfer on standard benchmarks.
Address New Orleans; USA; June 2022
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes (up) LAMP; 600.147; 602.200;MV Approved no
Call Number LWW2022a Serial 3785
Permanent link to this record
 

 
Author Lu Yu; Xialei Liu; Joost Van de Weijer
Title Self-Training for Class-Incremental Semantic Segmentation Type Journal Article
Year 2022 Publication IEEE Transactions on Neural Networks and Learning Systems Abbreviated Journal TNNLS
Volume Issue Pages
Keywords Class-incremental learning; Self-training; Semantic segmentation.
Abstract In class-incremental semantic segmentation, we have no access to the labeled data of previous tasks. Therefore, when incrementally learning new classes, deep neural networks suffer from catastrophic forgetting of previously learned knowledge. To address this problem, we propose to apply a self-training approach that leverages unlabeled data, which is used for rehearsal of previous knowledge. Specifically, we first learn a temporary model for the current task, and then, pseudo labels for the unlabeled data are computed by fusing information from the old model of the previous task and the current temporary model. In addition, conflict reduction is proposed to resolve the conflicts of pseudo labels generated from both the old and temporary models. We show that maximizing self-entropy can further improve results by smoothing the overconfident predictions. Interestingly, in the experiments, we show that the auxiliary data can be different from the training data and that even general-purpose, but diverse auxiliary data can lead to large performance gains. The experiments demonstrate the state-of-the-art results: obtaining a relative gain of up to 114% on Pascal-VOC 2012 and 8.5% on the more challenging ADE20K compared to previous state-of-the-art methods.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP; 600.147; 611.008;;CIC Approved no
Call Number Admin @ si @ YLW2022 Serial 3745
Permanent link to this record