Records |
Author |
Yaxing Wang; Abel Gonzalez-Garcia; David Berga; Luis Herranz; Fahad Shahbaz Khan; Joost Van de Weijer |
Title |
MineGAN: effective knowledge transfer from GANs to target domains with few images |
Type |
Conference Article |
Year |
2020 |
Publication |
33rd IEEE Conference on Computer Vision and Pattern Recognition |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
One of the attractive characteristics of deep neural networks is their ability to transfer knowledge obtained in one domain to other related domains. As a result, high-quality networks can be trained in domains with relatively little training data. This property has been extensively studied for discriminative networks but has received significantly less attention for generative models. Given the often enormous effort required to train GANs, both computationally as well as in the dataset collection, the re-use of pretrained GANs is a desirable objective. We propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods such as mode collapse and lack of flexibility. We perform experiments on several complex datasets using various GAN architectures (BigGAN, Progressive GAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs. |
Address |
Virtual CVPR |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPR |
Notes |
LAMP; 600.109; 600.141; 600.120 |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGB2020 |
Serial |
3421 |
Permanent link to this record |
|
|
|
Author |
Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z. Li |
Title |
Multi-modal Face Presentation Attach Detection |
Type |
Book Whole |
Year |
2020 |
Publication |
Synthesis Lectures on Computer Vision |
Abbreviated Journal |
|
Volume |
13 |
Issue |
|
Pages |
|
Keywords |
|
Abstract |
|
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HuPBA |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGE2020 |
Serial |
3440 |
Permanent link to this record |
|
|
|
Author |
Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li |
Title |
Advances in Face Presentation Attack Detection |
Type |
Book Whole |
Year |
2023 |
Publication |
Advances in Face Presentation Attack Detection |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
|
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGE2023a |
Serial |
3955 |
Permanent link to this record |
|
|
|
Author |
Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li |
Title |
Face Presentation Attack Detection (PAD) Challenges |
Type |
Book Chapter |
Year |
2023 |
Publication |
Advances in Face Presentation Attack Detection |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
17–35 |
Keywords |
|
Abstract |
In recent years, the security of face recognition systems has been increasingly threatened. Face Anti-spoofing (FAS) is essential to secure face recognition systems primarily from various attacks. In order to attract researchers and push forward the state of the art in Face Presentation Attack Detection (PAD), we organized three editions of Face Anti-spoofing Workshop and Competition at CVPR 2019, CVPR 2020, and ICCV 2021, which have attracted more than 800 teams from academia and industry, and greatly promoted the algorithms to overcome many challenging problems. In this chapter, we introduce the detailed competition process, including the challenge phases, timeline and evaluation metrics. Along with the workshop, we will introduce the corresponding dataset for each competition including data acquisition details, data processing, statistics, and evaluation protocol. Finally, we provide the available link to download the datasets used in the challenges. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
SLCV |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGE2023b |
Serial |
3956 |
Permanent link to this record |
|
|
|
Author |
Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li |
Title |
Face Anti-spoofing Progress Driven by Academic Challenges |
Type |
Book Chapter |
Year |
2023 |
Publication |
Advances in Face Presentation Attack Detection |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
1–15 |
Keywords |
|
Abstract |
With the ubiquity of facial authentication systems and the prevalence of security cameras around the world, the impact that facial presentation attack techniques may have is huge. However, research progress in this field has been slowed by a number of factors, including the lack of appropriate and realistic datasets, ethical and privacy issues that prevent the recording and distribution of facial images, the little attention that the community has given to potential ethnic biases among others. This chapter provides an overview of contributions derived from the organization of academic challenges in the context of face anti-spoofing detection. Specifically, we discuss the limitations of benchmarks and summarize our efforts in trying to boost research by the community via the participation in academic challenges |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
SLCV |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGE2023c |
Serial |
3957 |
Permanent link to this record |
|
|
|
Author |
Jun Wan; Guodong Guo; Sergio Escalera; Hugo Jair Escalante; Stan Z Li |
Title |
Best Solutions Proposed in the Context of the Face Anti-spoofing Challenge Series |
Type |
Book Chapter |
Year |
2023 |
Publication |
Advances in Face Presentation Attack Detection |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
37–78 |
Keywords |
|
Abstract |
The PAD competitions we organized attracted more than 835 teams from home and abroad, most of them from the industry, which shows that the topic of face anti-spoofing is closely related to daily life, and there is an urgent need for advanced algorithms to solve its application needs. Specifically, the Chalearn LAP multi-modal face anti-spoofing attack detection challenge attracted more than 300 teams for the development phase with a total of 13 teams qualifying for the final round; the Chalearn Face Anti-spoofing Attack Detection Challenge attracted 340 teams in the development stage, and finally, 11 and 8 teams have submitted their codes in the single-modal and multi-modal face anti-spoofing recognition challenges, respectively; the 3D High-Fidelity Mask Face Presentation Attack Detection Challenge attracted 195 teams for the development phase with a total of 18 teams qualifying for the final round. All the results were verified and re-run by the organizing team, and the results were used for the final ranking. In this chapter, we briefly the methods developed by the teams participating in each competition, and introduce the algorithm details of the top-three ranked teams in detail. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGE2023d |
Serial |
3958 |
Permanent link to this record |
|
|
|
Author |
Yaxing Wang; Abel Gonzalez-Garcia; Luis Herranz; Joost Van de Weijer |
Title |
Controlling biases and diversity in diverse image-to-image translation |
Type |
Journal Article |
Year |
2021 |
Publication |
Computer Vision and Image Understanding |
Abbreviated Journal |
CVIU |
Volume |
202 |
Issue |
|
Pages |
103082 |
Keywords |
|
Abstract |
JCR 2019 Q2, IF=3.121
The task of unpaired image-to-image translation is highly challenging due to the lack of explicit cross-domain pairs of instances. We consider here diverse image translation (DIT), an even more challenging setting in which an image can have multiple plausible translations. This is normally achieved by explicitly disentangling content and style in the latent representation and sampling different styles codes while maintaining the image content. Despite the success of current DIT models, they are prone to suffer from bias. In this paper, we study the problem of bias in image-to-image translation. Biased datasets may add undesired changes (e.g. change gender or race in face images) to the output translations as a consequence of the particular underlying visual distribution in the target domain. In order to alleviate the effects of this problem we propose the use of semantic constraints that enforce the preservation of desired image properties. Our proposed model is a step towards unbiased diverse image-to-image translation (UDIT), and results in less unwanted changes in the translated images while still performing the wanted transformation. Experiments on several heavily biased datasets show the effectiveness of the proposed techniques in different domains such as faces, objects, and scenes. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; 600.141; 600.109; 600.147 |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGH2021 |
Serial |
3464 |
Permanent link to this record |
|
|
|
Author |
Dong Wang; Jia Guo; Qiqi Shao; Haochi He; Zhian Chen; Chuanbao Xiao; Ajian Liu; Sergio Escalera; Hugo Jair Escalante; Zhen Lei; Jun Wan; Jiankang Deng |
Title |
Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results |
Type |
Conference Article |
Year |
2023 |
Publication |
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
6379-6390 |
Keywords |
|
Abstract |
Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems. Despite substantial advancements, the generalization of existing approaches to real-world applications remains challenging. This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets, which often leads to overfitting during training or saturation during testing. In terms of quantity, the number of spoof subjects is a critical determinant. Most datasets comprise fewer than 2,000 subjects. With regard to diversity, the majority of datasets consist of spoof samples collected in controlled environments using repetitive, mechanical processes. This data collection methodology results in homogenized samples and a dearth of scenario diversity. To address these shortcomings, we introduce the Wild Face Anti-Spoofing (WFAS) dataset, a large-scale, diverse FAS dataset collected in unconstrained settings. Our dataset encompasses 853,729 images of 321,751 spoof subjects and 529,571 images of 148,169 live subjects, representing a substantial increase in quantity. Moreover, our dataset incorporates spoof data obtained from the internet, spanning a wide array of scenarios and various commercial sensors, including 17 presentation attacks (PAs) that encompass both 2D and 3D forms. This novel data collection strategy markedly enhances FAS data diversity. Leveraging the WFAS dataset and Protocol 1 (Known-Type), we host the Wild Face Anti-Spoofing Challenge at the CVPR2023 workshop. Additionally, we meticulously evaluate representative methods using Protocol 1 and Protocol 2 (Unknown-Type). Through an in-depth examination of the challenge outcomes and benchmark baselines, we provide insightful analyses and propose potential avenues for future research. The dataset is released under Insightface 1 . |
Address |
Vancouver; Canada; June 2023 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPRW |
Notes |
HUPBA |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGS2023 |
Serial |
3919 |
Permanent link to this record |
|
|
|
Author |
Yaxing Wang; Abel Gonzalez-Garcia; Joost Van de Weijer; Luis Herranz |
Title |
SDIT: Scalable and Diverse Cross-domain Image Translation |
Type |
Conference Article |
Year |
2019 |
Publication |
27th ACM International Conference on Multimedia |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
1267–1276 |
Keywords |
|
Abstract |
Recently, image-to-image translation research has witnessed remarkable progress. Although current approaches successfully generate diverse outputs or perform scalable image transfer, these properties have not been combined into a single method. To address this limitation, we propose SDIT: Scalable and Diverse image-to-image translation. These properties are combined into a single generator. The diversity is determined by a latent variable which is randomly sampled from a normal distribution. The scalability is obtained by conditioning the network on the domain attributes. Additionally, we also exploit an attention mechanism that permits the generator to focus on the domain-specific attribute. We empirically demonstrate the performance of the proposed method on face mapping and other datasets beyond faces. |
Address |
Nice; Francia; October 2019 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ACM-MM |
Notes |
LAMP; 600.106; 600.109; 600.141; 600.120 |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGW2019 |
Serial |
3363 |
Permanent link to this record |
|
|
|
Author |
Yaxing Wang; Abel Gonzalez-Garcia; Chenshen Wu; Luis Herranz; Fahad Shahbaz Khan; Shangling Jui; Jian Yang; Joost Van de Weijer |
Title |
MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains |
Type |
Journal Article |
Year |
2024 |
Publication |
International Journal of Computer Vision |
Abbreviated Journal |
IJCV |
Volume |
132 |
Issue |
|
Pages |
490–514 |
Keywords |
|
Abstract |
Given the often enormous effort required to train GANs, both computationally as well as in dataset collection, the re-use of pretrained GANs largely increases the potential impact of generative models. Therefore, we propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods, such as mode collapse and lack of flexibility. Furthermore, to prevent overfitting on small target domains, we introduce sparse subnetwork selection, that restricts the set of trainable neurons to those that are relevant for the target dataset. We perform comprehensive experiments on several challenging datasets using various GAN architectures (BigGAN, Progressive GAN, and StyleGAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs. MineGAN. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; MACO |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WGW2024 |
Serial |
3888 |
Permanent link to this record |
|
|
|
Author |
Kai Wang; Luis Herranz; Anjan Dutta; Joost Van de Weijer |
Title |
Bookworm continual learning: beyond zero-shot learning and continual learning |
Type |
Conference Article |
Year |
2020 |
Publication |
Workshop TASK-CV 2020 |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
We propose bookworm continual learning(BCL), a flexible setting where unseen classes can be inferred via a semantic model, and the visual model can be updated continually. Thus BCL generalizes both continual learning (CL) and zero-shot learning (ZSL). We also propose the bidirectional imagination (BImag) framework to address BCL where features of both past and future classes are generated. We observe that conditioning the feature generator on attributes can actually harm the continual learning ability, and propose two variants (joint class-attribute conditioning and asymmetric generation) to alleviate this problem. |
Address |
Virtual; August 2020 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ECCVW |
Notes |
LAMP; 600.141; 600.120 |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WHD2020 |
Serial |
3466 |
Permanent link to this record |
|
|
|
Author |
Chenshen Wu; Luis Herranz; Xialei Liu; Joost Van de Weijer; Bogdan Raducanu |
Title |
Memory Replay GANs: Learning to Generate New Categories without Forgetting |
Type |
Conference Article |
Year |
2018 |
Publication |
32nd Annual Conference on Neural Information Processing Systems |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
5966-5976 |
Keywords |
|
Abstract |
Previous works on sequential learning address the problem of forgetting in discriminative models. In this paper we consider the case of generative models. In particular, we investigate generative adversarial networks (GANs) in the task of learning new categories in a sequential fashion. We first show that sequential fine tuning renders the network unable to properly generate images from previous categories (ie forgetting). Addressing this problem, we propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator. We study two methods to prevent forgetting by leveraging these replays, namely joint training with replay and replay alignment. Qualitative and quantitative experimental results in MNIST, SVHN and LSUN datasets show that our memory replay approach can generate competitive images while significantly mitigating the forgetting of previous categories. |
Address |
Montreal; Canada; December 2018 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
NIPS |
Notes |
LAMP; 600.106; 600.109; 602.200; 600.120 |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WHL2018 |
Serial |
3249 |
Permanent link to this record |
|
|
|
Author |
Yaxing Wang; Luis Herranz; Joost Van de Weijer |
Title |
Mix and match networks: multi-domain alignment for unpaired image-to-image translation |
Type |
Journal Article |
Year |
2020 |
Publication |
International Journal of Computer Vision |
Abbreviated Journal |
IJCV |
Volume |
128 |
Issue |
|
Pages |
2849–2872 |
Keywords |
|
Abstract |
This paper addresses the problem of inferring unseen cross-modal image-to-image translations between multiple modalities. We assume that only some of the pairwise translations have been seen (i.e. trained) and infer the remaining unseen translations (where training pairs are not available). We propose mix and match networks, an approach where multiple encoders and decoders are aligned in such a way that the desired translation can be obtained by simply cascading the source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). The main challenge lies in the alignment of the latent representations at the bottlenecks of encoder-decoder pairs. We propose an architecture with several tools to encourage alignment, including autoencoders and robust side information and latent consistency losses. We show the benefits of our approach in terms of effectiveness and scalability compared with other pairwise image-to-image translation approaches. We also propose zero-pair cross-modal image translation, a challenging setting where the objective is inferring semantic segmentation from depth (and vice-versa) without explicit segmentation-depth pairs, and only from two (disjoint) segmentation-RGB and depth-RGB training sets. We observe that a certain part of the shared information between unseen modalities might not be reachable, so we further propose a variant that leverages pseudo-pairs which allows us to exploit this shared information between the unseen modalities |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; 600.109; 600.106; 600.141; 600.120 |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WHW2020 |
Serial |
3424 |
Permanent link to this record |
|
|
|
Author |
Kai Wang; Luis Herranz; Joost Van de Weijer |
Title |
Continual learning in cross-modal retrieval |
Type |
Conference Article |
Year |
2021 |
Publication |
2nd CLVISION workshop |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
3628-3638 |
Keywords |
|
Abstract |
Multimodal representations and continual learning are two areas closely related to human intelligence. The former considers the learning of shared representation spaces where information from different modalities can be compared and integrated (we focus on cross-modal retrieval between language and visual representations). The latter studies how to prevent forgetting a previously learned task when learning a new one. While humans excel in these two aspects, deep neural networks are still quite limited. In this paper, we propose a combination of both problems into a continual cross-modal retrieval setting, where we study how the catastrophic interference caused by new tasks impacts the embedding spaces and their cross-modal alignment required for effective retrieval. We propose a general framework that decouples the training, indexing and querying stages. We also identify and study different factors that may lead to forgetting, and propose tools to alleviate it. We found that the indexing stage pays an important role and that simply avoiding reindexing the database with updated embedding networks can lead to significant gains. We evaluated our methods in two image-text retrieval datasets, obtaining significant gains with respect to the fine tuning baseline. |
Address |
Virtual; June 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
CVPRW |
Notes |
LAMP; 600.120; 600.141; 600.147; 601.379 |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WHW2021 |
Serial |
3566 |
Permanent link to this record |
|
|
|
Author |
Joost Van de Weijer; Fahad Shahbaz Khan; Marc Masana |
Title |
Interactive Visual and Semantic Image Retrieval |
Type |
Book Chapter |
Year |
2013 |
Publication |
Multimodal Interaction in Image and Video Applications |
Abbreviated Journal |
|
Volume |
48 |
Issue |
|
Pages |
31-35 |
Keywords |
|
Abstract |
One direct consequence of recent advances in digital visual data generation and the direct availability of this information through the World-Wide Web, is a urgent demand for efficient image retrieval systems. The objective of image retrieval is to allow users to efficiently browse through this abundance of images. Due to the non-expert nature of the majority of the internet users, such systems should be user friendly, and therefore avoid complex user interfaces. In this chapter we investigate how high-level information provided by recently developed object recognition techniques can improve interactive image retrieval. Wel apply a bagof- word based image representation method to automatically classify images in a number of categories. These additional labels are then applied to improve the image retrieval system. Next to these high-level semantic labels, we also apply a low-level image description to describe the composition and color scheme of the scene. Both descriptions are incorporated in a user feedback image retrieval setting. The main objective is to show that automatic labeling of images with semantic labels can improve image retrieval results. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
Angel Sappa; Jordi Vitria |
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
1868-4394 |
ISBN |
978-3-642-35931-6 |
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
CIC; 605.203; 600.048 |
Approved |
no |
Call Number ![sorted by Call Number field, ascending order (up)](img/sort_asc.gif) |
Admin @ si @ WKC2013 |
Serial |
2284 |
Permanent link to this record |