|
Marc Masana, Tinne Tuytelaars, & Joost Van de Weijer. (2021). Ternary Feature Masks: zero-forgetting for task-incremental learning. In 34th IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 3565–3574).
Abstract: We propose an approach without any forgetting to continual learning for the task-aware regime, where at inference the task-label is known. By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them. Using masks prevents both catastrophic forgetting and backward transfer. We argue -- and show experimentally -- that avoiding the former largely compensates for the lack of the latter, which is rarely observed in practice. In contrast to earlier works, our masks are applied to the features (activations) of each layer instead of the weights. This considerably reduces the number of mask parameters for each new task; with more than three orders of magnitude for most networks. The encoding of the ternary masks into two bits per feature creates very little overhead to the network, avoiding scalability issues. To allow already learned features to adapt to the current task without changing the behavior of these features for previous tasks, we introduce task-specific feature normalization. Extensive experiments on several finegrained datasets and ImageNet show that our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
|
|
|
Sudeep Katakol, Luis Herranz, Fei Yang, & Marta Mrak. (2021). DANICE: Domain adaptation without forgetting in neural image compression. In Conference on Computer Vision and Pattern Recognition Workshops (pp. 1921–1925).
Abstract: Neural image compression (NIC) is a new coding paradigm where coding capabilities are captured by deep models learned from data. This data-driven nature enables new potential functionalities. In this paper, we study the adaptability of codecs to custom domains of interest. We show that NIC codecs are transferable and that they can be adapted with relatively few target domain images. However, naive adaptation interferes with the solution optimized for the original source domain, resulting in forgetting the original coding capabilities in that domain, and may even break the compatibility with previously encoded bitstreams. Addressing these problems, we propose Codec Adaptation without Forgetting (CAwF), a framework that can avoid these problems by adding a small amount of custom parameters, where the source codec remains embedded and unchanged during the adaptation process. Experiments demonstrate its effectiveness and provide useful insights on the characteristics of catastrophic interference in NIC.
|
|
|
Fei Yang, Luis Herranz, Yongmei Cheng, & Mikhail Mozerov. (2021). Slimmable compressive autoencoders for practical neural image compression. In 34th IEEE Conference on Computer Vision and Pattern Recognition (pp. 4996–5005).
Abstract: Neural image compression leverages deep neural networks to outperform traditional image codecs in rate-distortion performance. However, the resulting models are also heavy, computationally demanding and generally optimized for a single rate, limiting their practical use. Focusing on practical image compression, we propose slimmable compressive autoencoders (SlimCAEs), where rate (R) and distortion (D) are jointly optimized for different capacities. Once trained, encoders and decoders can be executed at different capacities, leading to different rates and complexities. We show that a successful implementation of SlimCAEs requires suitable capacity-specific RD tradeoffs. Our experiments show that SlimCAEs are highly flexible models that provide excellent rate-distortion performance, variable rate, and dynamic adjustment of memory, computational cost and latency, thus addressing the main requirements of practical image compression.
|
|
|
Arturo Fuentes, F. Javier Sanchez, Thomas Voncina, & Jorge Bernal. (2021). LAMV: Learning to Predict Where Spectators Look in Live Music Performances. In 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vol. 5, pp. 500–507).
Abstract: The advent of artificial intelligence has supposed an evolution on how different daily work tasks are performed. The analysis of cultural content has seen a huge boost by the development of computer-assisted methods that allows easy and transparent data access. In our case, we deal with the automation of the production of live shows, like music concerts, aiming to develop a system that can indicate the producer which camera to show based on what each of them is showing. In this context, we consider that is essential to understand where spectators look and what they are interested in so the computational method can learn from this information. The work that we present here shows the results of a first preliminary study in which we compare areas of interest defined by human beings and those indicated by an automatic system. Our system is based on the extraction of motion textures from dynamic Spatio-Temporal Volumes (STV) and then analyzing the patterns by means of texture analysis techniques. We validate our approach over several video sequences that have been labeled by 16 different experts. Our method is able to match those relevant areas identified by the experts, achieving recall scores higher than 80% when a distance of 80 pixels between method and ground truth is considered. Current performance shows promise when detecting abnormal peaks and movement trends.
|
|
|
Adria Molina, Pau Riba, Lluis Gomez, Oriol Ramos Terrades, & Josep Llados. (2021). Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach. In 16th International Conference on Document Analysis and Recognition (Vol. 12822, pp. 306–320). LNCS.
Abstract: This paper presents a novel method for date estimation of historical photographs from archival sources. The main contribution is to formulate the date estimation as a retrieval task, where given a query, the retrieved images are ranked in terms of the estimated date similarity. The closer are their embedded representations the closer are their dates. Contrary to the traditional models that design a neural network that learns a classifier or a regressor, we propose a learning objective based on the nDCG ranking metric. We have experimentally evaluated the performance of the method in two different tasks: date estimation and date-sensitive image retrieval, using the DEW public database, overcoming the baseline methods.
|
|
|
Pau Riba, Adria Molina, Lluis Gomez, Oriol Ramos Terrades, & Josep Llados. (2021). Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting. In 16th International Conference on Document Analysis and Recognition (Vol. 12822, 381–395).
Abstract: In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both, handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work.
|
|
|
Sanket Biswas, Pau Riba, Josep Llados, & Umapada Pal. (2021). DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis. In 16th International Conference on Document Analysis and Recognition (Vol. 12823, 555–568). LNCS.
Abstract: Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed DocSynth model learns to generate a set of realistic document images consistent with the defined layout. Also, this framework has been adapted to this work as a superior baseline model for creating synthetic document image datasets for augmenting real data during training for document layout analysis tasks. Different sets of learning objectives have been also used to improve the model performance. Quantitatively, we also compare the generated results of our model with real data using standard evaluation metrics. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects. We also present a comprehensive qualitative analysis summary of the different scopes of synthetic image generation tasks. Lastly, to our knowledge this is the first work of its kind.
|
|
|
Patricia Suarez, Angel Sappa, Boris X. Vintimilla, & Riad I. Hammoud. (2021). Cycle Generative Adversarial Network: Towards A Low-Cost Vegetation Index Estimation. In 28th IEEE International Conference on Image Processing (pp. 19–22).
Abstract: This paper presents a novel unsupervised approach to estimate the Normalized Difference Vegetation Index (NDVI). The NDVI is obtained as the ratio between information from the visible and near infrared spectral bands; in the current work, the NDVI is estimated just from an image of the visible spectrum through a Cyclic Generative Adversarial Network (CyclicGAN). This unsupervised architecture learns to estimate the NDVI index by means of an image translation between the red channel of a given RGB image and the NDVI unpaired index’s image. The translation is obtained by means of a ResNET architecture and a multiple loss function. Experimental results obtained with this unsupervised scheme show the validity of the implemented model. Additionally, comparisons with the state of the art approaches are provided showing improvements with the proposed approach.
|
|
|
Rafael E. Rivadeneira, Angel Sappa, Boris X. Vintimilla, Sabari Nathan, Priya Kansal, Armin Mehri, et al. (2021). Thermal Image Super-Resolution Challenge – PBVS 2021. In Conference on Computer Vision and Pattern Recognition Workshops (pp. 4359–4367).
Abstract: This paper presents results from the second Thermal Image Super-Resolution (TISR) challenge organized in the framework of the Perception Beyond the Visible Spectrum (PBVS) 2021 workshop. For this second edition, the same thermal image dataset considered during the first challenge has been used; only mid-resolution (MR) and high-resolution (HR) sets have been considered. The dataset consists of 951 training images and 50 testing images for each resolution. A set of 20 images for each resolution is kept aside for evaluation. The two evaluation methodologies proposed for the first challenge are also considered in this opportunity. The first evaluation task consists of measuring the PSNR and SSIM between the obtained SR image and the corresponding ground truth (i.e., the HR thermal image downsampled by four). The second evaluation also consists of measuring the PSNR and SSIM, but in this case, considers the x2 SR obtained from the given MR thermal image; this evaluation is performed between the SR image with respect to the semi-registered HR image, which has been acquired with another camera. The results outperformed those from the first challenge, thus showing an improvement in both evaluation metrics.
|
|
|
Armin Mehri, Parichehr Behjati Ardakani, & Angel Sappa. (2021). MPRNet: Multi-Path Residual Network for Lightweight Image Super Resolution. In IEEE Winter Conference on Applications of Computer Vision (pp. 2703–2712).
Abstract: Lightweight super resolution networks have extremely importance for real-world applications. In recent years several SR deep learning approaches with outstanding achievement have been introduced by sacrificing memory and computational cost. To overcome this problem, a novel lightweight super resolution network is proposed, which improves the SOTA performance in lightweight SR and performs roughly similar to computationally expensive networks. Multi-Path Residual Network designs with a set of Residual concatenation Blocks stacked with Adaptive Residual Blocks: ($i$) to adaptively extract informative features and learn more expressive spatial context information; ($ii$) to better leverage multi-level representations before up-sampling stage; and ($iii$) to allow an efficient information and gradient flow within the network. The proposed architecture also contains a new attention mechanism, Two-Fold Attention Module, to maximize the representation ability of the model. Extensive experiments show the superiority of our model against other SOTA SR approaches.
|
|
|
Armin Mehri, Parichehr Behjati Ardakani, & Angel Sappa. (2021). LiNet: A Lightweight Network for Image Super Resolution. In 25th International Conference on Pattern Recognition (pp. 7196–7202).
Abstract: This paper proposes a new lightweight network, LiNet, that enhancing technical efficiency in lightweight super resolution and operating approximately like very large and costly networks in terms of number of network parameters and operations. The proposed architecture allows the network to learn more abstract properties by avoiding low-level information via multiple links. LiNet introduces a Compact Dense Module, which contains set of inner and outer blocks, to efficiently extract meaningful information, to better leverage multi-level representations before upsampling stage, and to allow an efficient information and gradient flow within the network. Experiments on benchmark datasets show that the proposed LiNet achieves favorable performance against lightweight state-of-the-art methods.
|
|
|
Bartlomiej Twardowski, Pawel Zawistowski, & Szymon Zaborowski. (2021). Metric Learning for Session-Based Recommendations. In 43rd edition of the annual BCS-IRSG European Conference on Information Retrieval (Vol. 12656, pp. 650–665). LNCS.
Abstract: Session-based recommenders, used for making predictions out of users’ uninterrupted sequences of actions, are attractive for many applications. Here, for this task we propose using metric learning, where a common embedding space for sessions and items is created, and distance measures dissimilarity between the provided sequence of users’ events and the next action. We discuss and compare metric learning approaches to commonly used learning-to-rank methods, where some synergies exist. We propose a simple architecture for problem analysis and demonstrate that neither extensively big nor deep architectures are necessary in order to outperform existing methods. The experimental results against strong baselines on four datasets are provided with an ablation study.
Keywords: Session-based recommendations; Deep metric learning; Learning to rank
|
|
|
Albin Soutif, Marc Masana, Joost Van de Weijer, & Bartlomiej Twardowski. (2021). On the importance of cross-task features for class-incremental learning. In Theory and Foundation of continual learning workshop of ICML.
Abstract: In class-incremental learning, an agent with limited resources needs to learn a sequence of classification tasks, forming an ever growing classification problem, with the constraint of not being able to access data from previous tasks. The main difference with task-incremental learning, where a task-ID is available at inference time, is that the learner also needs to perform crosstask discrimination, i.e. distinguish between classes that have not been seen together. Approaches to tackle this problem are numerous and mostly make use of an external memory (buffer) of non-negligible size. In this paper, we ablate the learning of crosstask features and study its influence on the performance of basic replay strategies used for class-IL. We also define a new forgetting measure for class-incremental learning, and see that forgetting is not the principal cause of low performance. Our experimental results show that future algorithms for class-incremental learning should not only prevent forgetting, but also aim to improve the quality of the cross-task features. This is especially important when the number of classes per task is small.
|
|
|
Graham D. Finlayson, Javier Vazquez, & Fufu Fang. (2021). The Discrete Cosine Maximum Ignorance Assumption. In 29th Color and Imaging Conference (pp. 13–18).
Abstract: the performance of colour correction algorithms are dependent on the reflectance sets used. Sometimes, when the testing reflectance set is changed the ranking of colour correction algorithms also changes. To remove dependence on dataset we can
make assumptions about the set of all possible reflectances. In the Maximum Ignorance with Positivity (MIP) assumption we assume that all reflectances with per wavelength values between 0 and 1 are equally likely. A weakness in the MIP is that it fails to take into account the correlation of reflectance functions between
wavelengths (many of the assumed reflectances are, in reality, not possible).
In this paper, we take the view that the maximum ignorance assumption has merit but, hitherto it has been calculated with respect to the wrong coordinate basis. Here, we propose the Discrete Cosine Maximum Ignorance assumption (DCMI), where
all reflectances that have coordinates between max and min bounds in the Discrete Cosine Basis coordinate system are equally likely.
Here, the correlation between wavelengths is encoded and this results in the set of all plausible reflectances ’looking like’ typical reflectances that occur in nature. This said the DCMI model is also a superset of all measured reflectance sets.
Experiments show that, in colour correction, adopting the DCMI results in similar colour correction performance as using a particular reflectance set.
|
|
|
Razieh Rastgoo, Kourosh Kiani, Sergio Escalera, & Mohammad Sabokrou. (2021). Sign Language Production: A Review. In Conference on Computer Vision and Pattern Recognition Workshops (pp. 3472–3481).
Abstract: Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language and vice versa is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system. Sign language recognition and production need to cope with some critical challenges. In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning. This survey aims to briefly summarize recent achievements in SLP, discussing their advantages, limitations, and future directions of research.
|
|