|   | 
Details
   web
Records
Author Hugo Prol; Vincent Dumoulin; Luis Herranz
Title Cross-Modulation Networks for Few-Shot Learning Type Miscellaneous
Year 2018 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract A family of recent successful approaches to few-shot learning relies on learning an embedding space in which predictions are made by computing similarities between examples. This corresponds to combining information between support and query examples at a very late stage of the prediction pipeline. Inspired by this observation, we hypothesize that there may be benefits to combining the information at various levels of abstraction along the pipeline. We present an architecture called Cross-Modulation Networks which allows support and query examples to interact throughout the feature extraction process via a feature-wise modulation mechanism. We adapt the Matching Networks architecture to take advantage of these interactions and show encouraging initial results on miniImageNet in the 5-way, 1-shot setting, where we close the gap with state-of-the-art.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ PDH2018 Serial 3248
Permanent link to this record
 

 
Author Luis Herranz; Weiqing Min; Shuqiang Jiang
Title Food recognition and recipe analysis: integrating visual content, context and external knowledge Type Miscellaneous
Year 2018 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract The central role of food in our individual and social life, combined with recent technological advances, has motivated a growing interest in applications that help to better monitor dietary habits as well as the exploration and retrieval of food-related information. We review how visual content, context and external knowledge can be integrated effectively into food-oriented applications, with special focus on recipe analysis and retrieval, food recommendation and restaurant context as emerging directions.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ HMJ2018 Serial 3250
Permanent link to this record
 

 
Author Mikhail Mozerov; Fei Yang; Joost Van de Weijer
Title Sparse Data Interpolation Using the Geodesic Distance Affinity Space Type Journal Article
Year 2019 Publication IEEE Signal Processing Letters Abbreviated Journal SPL
Volume 26 Issue 6 Pages 943 - 947
Keywords
Abstract In this letter, we adapt the geodesic distance-based recursive filter to the sparse data interpolation problem. The proposed technique is general and can be easily applied to any kind of sparse data. We demonstrate its superiority over other interpolation techniques in three experiments for qualitative and quantitative evaluation. In addition, we compare our method with the popular interpolation algorithm presented in the paper on EpicFlow optical flow, which is intuitively motivated by a similar geodesic distance principle. The comparison shows that our algorithm is more accurate and considerably faster than the EpicFlow interpolation technique.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ MYW2019 Serial 3261
Permanent link to this record
 

 
Author Victoria Ruiz; Angel Sanchez; Jose F. Velez; Bogdan Raducanu
Title Automatic Image-Based Waste Classification Type Conference Article
Year 2019 Publication International Work-Conference on the Interplay Between Natural and Artificial Computation. From Bioinspired Systems and Biomedical Applications to Machine Learning Abbreviated Journal
Volume 11487 Issue Pages 422–431
Keywords Computer Vision; Deep learning; Convolutional neural networks; Waste classification
Abstract The management of solid waste in large urban environments has become a complex problem due to increasing amount of waste generated every day by citizens and companies. Current Computer Vision and Deep Learning techniques can help in the automatic detection and classification of waste types for further recycling tasks. In this work, we use the TrashNet dataset to train and compare different deep learning architectures for automatic classification of garbage types. In particular, several Convolutional Neural Networks (CNN) architectures were compared: VGG, Inception and ResNet. The best classification results were obtained using a combined Inception-ResNet model that achieved 88.6% of accuracy. These are the best results obtained with the considered dataset.
Address Almeria; June 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference IWINAC
Notes (up) LAMP; 600.120 Approved no
Call Number RSV2019 Serial 3273
Permanent link to this record
 

 
Author Corina Krauter; Ursula Reiter; Albrecht Schmidt; Marc Masana; Rudolf Stollberger; Michael Fuchsjager; Gert Reiter
Title Objective extraction of the temporal evolution of the mitral valve vortex ring from 4D flow MRI Type Conference Article
Year 2019 Publication 27th Annual Meeting & Exhibition of the International Society for Magnetic Resonance in Medicine Abbreviated Journal
Volume Issue Pages
Keywords
Abstract The mitral valve vortex ring is a promising flow structure for analysis of diastolic function, however, methods for objective extraction of its formation to dissolution are lacking. We present a novel algorithm for objective extraction of the temporal evolution of the mitral valve vortex ring from magnetic resonance 4D flow data and validated the method against visual analysis. The algorithm successfully extracted mitral valve vortex rings during both early- and late-diastolic filling and agreed substantially with visual assessment. Early-diastolic mitral valve vortex ring properties differed between healthy subjects and patients with ischemic heart disease.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ISMRM
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ KRS2019 Serial 3300
Permanent link to this record
 

 
Author Fei Yang; Yongmei Cheng; Joost Van de Weijer; Mikhail Mozerov
Title Improved Discrete Optical Flow Estimation With Triple Image Matching Cost Type Journal Article
Year 2020 Publication IEEE Access Abbreviated Journal ACCESS
Volume 8 Issue Pages 17093 - 17102
Keywords
Abstract Approaches that use more than two consecutive video frames in the optical flow estimation have a long research history. However, almost all such methods utilize extra information for a pre-processing flow prediction or for a post-processing flow correction and filtering. In contrast, this paper differs from previously developed techniques. We propose a new algorithm for the likelihood function calculation (alternatively the matching cost volume) that is used in the maximum a posteriori estimation. We exploit the fact that in general, optical flow is locally constant in the sense of time and the likelihood function depends on both the previous and the future frame. Implementation of our idea increases the robustness of optical flow estimation. As a result, our method outperforms 9% over the DCFlow technique, which we use as prototype for our CNN based computation architecture, on the most challenging MPI-Sintel dataset for the non-occluded mask metric. Furthermore, our approach considerably increases the accuracy of the flow estimation for the matching cost processing, consequently outperforming the original DCFlow algorithm results up to 50% in occluded regions and up to 9% in non-occluded regions on the MPI-Sintel dataset. The experimental section shows that the proposed method achieves state-of-the-arts results especially on the MPI-Sintel dataset.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ YCW2020 Serial 3345
Permanent link to this record
 

 
Author Rada Deeb; Joost Van de Weijer; Damien Muselet; Mathieu Hebert; Alain Tremeau
Title Deep spectral reflectance and illuminant estimation from self-interreflections Type Journal Article
Year 2019 Publication Journal of the Optical Society of America A Abbreviated Journal JOSA A
Volume 31 Issue 1 Pages 105-114
Keywords
Abstract In this work, we propose a convolutional neural network based approach to estimate the spectral reflectance of a surface and spectral power distribution of light from a single RGB image of a V-shaped surface. Interreflections happening in a concave surface lead to gradients of RGB values over its area. These gradients carry a lot of information concerning the physical properties of the surface and the illuminant. Our network is trained with only simulated data constructed using a physics-based interreflection model. Coupling interreflection effects with deep learning helps to retrieve the spectral reflectance under an unknown light and to estimate spectral power distribution of this light as well. In addition, it is more robust to the presence of image noise than classical approaches. Our results show that the proposed approach outperforms state-of-the-art learning-based approaches on simulated data. In addition, it gives better results on real data compared to other interreflection-based approaches.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ DWM2019 Serial 3362
Permanent link to this record
 

 
Author Lu Yu
Title Semantic Representation: From Color to Deep Embeddings Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings.
In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists.
In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
Address November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Yongmei Cheng
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-3-3 Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ Yu2019 Serial 3394
Permanent link to this record
 

 
Author Xialei Liu
Title Visual recognition in the wild: learning from rankings in small domains and continual learning in new domains Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Deep convolutional neural networks (CNNs) have achieved superior performance in many visual recognition application, such as image classification, detection and segmentation. In this thesis we address two limitations of CNNs. Training deep CNNs requires huge amounts of labeled data, which is expensive and labor intensive to collect. Another limitation is that training CNNs in a continual learning setting is still an open research question. Catastrophic forgetting is very likely when adapting trained models to new environments or new tasks. Therefore, in this thesis, we aim to improve CNNs for applications with limited data and to adapt CNNs continually to new tasks.
Self-supervised learning leverages unlabelled data by introducing an auxiliary task for which data is abundantly available. In the first part of the thesis, we show how rankings can be used as a proxy self-supervised task for regression problems. Then we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning. We then apply our framework on two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both, we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results. We further show that active learning using rankings can reduce labeling effort by up to 50\% for both IQA and crowd counting.
In the second part of the thesis, we propose two approaches to avoiding catastrophic forgetting in sequential task learning scenarios. The first approach is derived from Elastic Weight Consolidation, which uses a diagonal Fisher Information Matrix (FIM) to measure the importance of the parameters of the network. However the diagonal assumption is unrealistic. Therefore, we approximately diagonalize the FIM using a set of factorized rotation parameters. This leads to significantly better performance on continual learning of sequential tasks. For the second approach, we show that forgetting manifests differently at different layers in the network and propose a hybrid approach where distillation is used in the feature extractor and replay in the classifier via feature generation. Our method addresses the limitations of generative image replay and probability distillation (i.e. learning without forgetting) and can naturally aggregate new tasks in a single, well-calibrated classifier. Experiments confirm that our proposed approach outperforms the baselines and some start-of-the-art methods.
Address December 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Andrew Bagdanov
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-4-0 Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ Liu2019 Serial 3396
Permanent link to this record
 

 
Author Carola Figueroa Flores
Title Visual Saliency for Object Recognition, and Object Recognition for Visual Saliency Type Book Whole
Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords computer vision; visual saliency; fine-grained object recognition; convolutional neural networks; images classification
Abstract For humans, the recognition of objects is an almost instantaneous, precise and
extremely adaptable process. Furthermore, we have the innate capability to learn
new object classes from only few examples. The human brain lowers the complexity
of the incoming data by filtering out part of the information and only processing
those things that capture our attention. This, mixed with our biological predisposition to respond to certain shapes or colors, allows us to recognize in a simple
glance the most important or salient regions from an image. This mechanism can
be observed by analyzing on which parts of images subjects place attention; where
they fix their eyes when an image is shown to them. The most accurate way to
record this behavior is to track eye movements while displaying images.
Computational saliency estimation aims to identify to what extent regions or
objects stand out with respect to their surroundings to human observers. Saliency
maps can be used in a wide range of applications including object detection, image
and video compression, and visual tracking. The majority of research in the field has
focused on automatically estimating saliency maps given an input image. Instead, in
this thesis, we set out to incorporate saliency maps in an object recognition pipeline:
we want to investigate whether saliency maps can improve object recognition
results.
In this thesis, we identify several problems related to visual saliency estimation.
First, to what extent the estimation of saliency can be exploited to improve the
training of an object recognition model when scarce training data is available. To
solve this problem, we design an image classification network that incorporates
saliency information as input. This network processes the saliency map through a
dedicated network branch and uses the resulting characteristics to modulate the
standard bottom-up visual characteristics of the original image input. We will refer to this technique as saliency-modulated image classification (SMIC). In extensive
experiments on standard benchmark datasets for fine-grained object recognition,
we show that our proposed architecture can significantly improve performance,
especially on dataset with scarce training data.
Next, we address the main drawback of the above pipeline: SMIC requires an
explicit saliency algorithm that must be trained on a saliency dataset. To solve this,
we implement a hallucination mechanism that allows us to incorporate the saliency
estimation branch in an end-to-end trained neural network architecture that only
needs the RGB image as an input. A side-effect of this architecture is the estimation
of saliency maps. In experiments, we show that this architecture can obtain similar
results on object recognition as SMIC but without the requirement of ground truth
saliency maps to train the system.
Finally, we evaluated the accuracy of the saliency maps that occur as a sideeffect of object recognition. For this purpose, we use a set of benchmark datasets
for saliency evaluation based on eye-tracking experiments. Surprisingly, the estimated saliency maps are very similar to the maps that are computed from human
eye-tracking experiments. Our results show that these saliency maps can obtain
competitive results on benchmark saliency maps. On one synthetic saliency dataset
this method even obtains the state-of-the-art without the need of ever having seen
an actual saliency image for training.
Address March 2021
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Bogdan Raducanu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-4-7 Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ Fig2021 Serial 3600
Permanent link to this record
 

 
Author Marc Masana
Title Lifelong Learning of Neural Networks: Detecting Novelty and Adapting to New Domains without Forgetting Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Computer vision has gone through considerable changes in the last decade as neural networks have come into common use. As available computational capabilities have grown, neural networks have achieved breakthroughs in many computer vision tasks, and have even surpassed human performance in others. With accuracy being so high, focus has shifted to other issues and challenges. One research direction that saw a notable increase in interest is on lifelong learning systems. Such systems should be capable of efficiently performing tasks, identifying and learning new ones, and should moreover be able to deploy smaller versions of themselves which are experts on specific tasks. In this thesis, we contribute to research on lifelong learning and address the compression and adaptation of networks to small target domains, the incremental learning of networks faced with a variety of tasks, and finally the detection of out-of-distribution samples at inference time.

We explore how knowledge can be transferred from large pretrained models to more task-specific networks capable of running on smaller devices by extracting the most relevant information. Using a pretrained model provides more robust representations and a more stable initialization when learning a smaller task, which leads to higher performance and is known as domain adaptation. However, those models are too large for certain applications that need to be deployed on devices with limited memory and computational capacity. In this thesis we show that, after performing domain adaptation, some learned activations barely contribute to the predictions of the model. Therefore, we propose to apply network compression based on low-rank matrix decomposition using the activation statistics. This results in a significant reduction of the model size and the computational cost.

Like human intelligence, machine intelligence aims to have the ability to learn and remember knowledge. However, when a trained neural network is presented with learning a new task, it ends up forgetting previous ones. This is known as catastrophic forgetting and its avoidance is studied in continual learning. The work presented in this thesis extensively surveys continual learning techniques and presents an approach to avoid catastrophic forgetting in sequential task learning scenarios. Our technique is based on using ternary masks in order to update a network to new tasks, reusing the knowledge of previous ones while not forgetting anything about them. In contrast to earlier work, our masks are applied to the activations of each layer instead of the weights. This considerably reduces the number of parameters to be added for each new task. Furthermore, the analysis on a wide range of work on incremental learning without access to the task-ID, provides insight on current state-of-the-art approaches that focus on avoiding catastrophic forgetting by using regularization, rehearsal of previous tasks from a small memory, or compensating the task-recency bias.

Neural networks trained with a cross-entropy loss force the outputs of the model to tend toward a one-hot encoded vector. This leads to models being too overly confident when presented with images or classes that were not present in the training distribution. The capacity of a system to be aware of the boundaries of the learned tasks and identify anomalies or classes which have not been learned yet is key to lifelong learning and autonomous systems. In this thesis, we present a metric learning approach to out-of-distribution detection that learns the task at hand on an embedding space.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Andrew Bagdanov
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-9-5 Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ Mas20 Serial 3481
Permanent link to this record
 

 
Author Riccardo Del Chiaro; Bartlomiej Twardowski; Andrew Bagdanov; Joost Van de Weijer
Title Recurrent attention to transient tasks for continual image captioning Type Conference Article
Year 2020 Publication 34th Conference on Neural Information Processing Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight egularization and knowledge distillation to recurrent continual learning problems. We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn five captioning tasks while incurring no forgetting of previously learned ones.
Address virtual; December 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference NEURIPS
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ CTB2020 Serial 3484
Permanent link to this record
 

 
Author Yaxing Wang; Lu Yu; Joost Van de Weijer
Title DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs Type Conference Article
Year 2020 Publication 34th Conference on Neural Information Processing Systems Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Image-to-image translation has recently achieved remarkable results. But despite current success, it suffers from inferior performance when translations between classes require large shape changes. We attribute this to the high-resolution bottlenecks which are used by current state-of-the-art image-to-image methods. Therefore, in this work, we propose a novel deep hierarchical Image-to-Image Translation method, called DeepI2I. We learn a model by leveraging hierarchical features: (a) structural information contained in the shallow layers and (b) semantic information extracted from the deep layers. To enable the training of deep I2I models on small datasets, we propose a novel transfer learning method, that transfers knowledge from pre-trained GANs. Specifically, we leverage the discriminator of a pre-trained GANs (i.e. BigGAN or StyleGAN) to initialize both the encoder and the discriminator and the pre-trained generator to initialize the generator of our model. Applying knowledge transfer leads to an alignment problem between the encoder and generator. We introduce an adaptor network to address this. On many-class image-to-image translation on three datasets (Animal faces, Birds, and Foods) we decrease mFID by at least 35% when compared to the state-of-the-art. Furthermore, we qualitatively and quantitatively demonstrate that transfer learning significantly improves the performance of I2I systems, especially for small datasets. Finally, we are the first to perform I2I translations for domains with over 100 classes.
Address virtual; December 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference NEURIPS
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ WYW2020 Serial 3485
Permanent link to this record
 

 
Author Yaxing Wang; Salman Khan; Abel Gonzalez-Garcia; Joost Van de Weijer; Fahad Shahbaz Khan
Title Semi-supervised Learning for Few-shot Image-to-Image Translation Type Conference Article
Year 2020 Publication 33rd IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In the last few years, unpaired image-to-image translation has witnessed remarkable progress. Although the latest methods are able to generate realistic images, they crucially rely on a large number of labeled images. Recently, some methods have tackled the challenging setting of few-shot image-to-image translation, reducing the labeled data requirements for the target domain during inference. In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training. To do so, we propose applying semi-supervised learning via a noise-tolerant pseudo-labeling procedure. We also apply a cycle consistency constraint to further exploit the information from unlabeled images, either from the same dataset or external. Additionally, we propose several structural modifications to facilitate the image translation task under these circumstances. Our semi-supervised method for few-shot image translation, called SEMIT, achieves excellent results on four different datasets using as little as 10% of the source labels, and matches the performance of the main fully-supervised competitor using only 20% labeled data. Our code and models are made public at: this https URL.
Address Virtual; June 2020
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPR
Notes (up) LAMP; 600.120 Approved no
Call Number Admin @ si @ WKG2020 Serial 3486
Permanent link to this record
 

 
Author Rahma Kalboussi; Aymen Azaza; Joost Van de Weijer; Mehrez Abdellaoui; Ali Douik
Title Object proposals for salient object segmentation in videos Type Journal Article
Year 2020 Publication Multimedia Tools and Applications Abbreviated Journal MTAP
Volume 79 Issue 13 Pages 8677-8693
Keywords
Abstract Salient object segmentation in videos is generally broken up in a video segmentation part and a saliency assignment part. Recently, object proposals, which are used to segment the image, have had significant impact on many computer vision applications, including image segmentation, object detection, and recently saliency detection in still images. However, their usage has not yet been evaluated for salient object segmentation in videos. Therefore, in this paper, we investigate the application of object proposals to salient object segmentation in videos. In addition, we propose a new motion feature derived from the optical flow structure tensor for video saliency detection. Experiments on two standard benchmark datasets for video saliency show that the proposed motion feature improves saliency estimation results, and that object proposals are an efficient method for salient object segmentation. Results on the challenging SegTrack v2 and Fukuchi benchmark data sets show that we significantly outperform the state-of-the-art.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) LAMP; 600.120 Approved no
Call Number KAW2020 Serial 3504
Permanent link to this record