Records |
Author |
Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas |
Title |
Real-time Lexicon-free Scene Text Retrieval |
Type |
Journal Article |
Year |
2021 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
Volume |
110 |
Issue |
|
Pages |
107656 |
Keywords |
|
Abstract |
In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG; 600.121; 600.129; 601.338 |
Approved |
no |
Call Number |
Admin @ si @ MTD2021 |
Serial |
3493 |
Permanent link to this record |
|
|
|
Author |
Gemma Rotger |
Title |
Lifelike Humans: Detailed Reconstruction of Expressive Human Faces |
Type |
Book Whole |
Year |
2021 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
|
Keywords |
|
Abstract |
Developing human-like digital characters is a challenging task since humans are used to recognizing our fellows, and find the computed generated characters inadequately humanized. To fulfill the standards of the videogame and digital film productions it is necessary to model and animate these characters the most closely to human beings. However, it is an arduous and expensive task, since many artists and specialists are required to work on a single character. Therefore, to fulfill these requirements we found an interesting option to study the automatic creation of detailed characters through inexpensive setups. In this work, we develop novel techniques to bring detailed characters by combining different aspects that stand out when developing realistic characters, skin detail, facial hairs, expressions, and microexpressions. We examine each of the mentioned areas with the aim of automatically recover each of the parts without user interaction nor training data. We study the problems for their robustness but also for the simplicity of the setup, preferring single-image with uncontrolled illumination and methods that can be easily computed with the commodity of a standard laptop. A detailed face with wrinkles and skin details is vital to develop a realistic character. In this work, we introduce our method to automatically describe facial wrinkles on the image and transfer to the recovered base face. Then we advance to facial hair recovery by resolving a fitting problem with a novel parametrization model. As of last, we develop a mapping function that allows transfer expressions and microexpressions between different meshes, which provides realistic animations to our detailed mesh. We cover all the mentioned points with the focus on key aspects as (i) how to describe skin wrinkles in a simple and straightforward manner, (ii) how to recover 3D from 2D detections, (iii) how to recover and model facial hair from 2D to 3D, (iv) how to transfer expressions between models holding both skin detail and facial hair, (v) how to perform all the described actions without training data nor user interaction. In this work, we present our proposals to solve these aspects with an efficient and simple setup. We validate our work with several datasets both synthetic and real data, prooving remarkable results even in challenging cases as occlusions as glasses, thick beards, and indeed working with different face topologies like single-eyed cyclops. |
Address |
|
Corporate Author |
|
Thesis |
Ph.D. thesis |
Publisher |
Ediciones Graficas Rey |
Place of Publication |
|
Editor |
Felipe Lumbreras;Antonio Agudo |
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
978-84-122714-3-0 |
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS |
Approved |
no |
Call Number |
Admin @ si @ Rot2021 |
Serial |
3513 |
Permanent link to this record |
|
|
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera |
Title |
Sign Language Recognition: A Deep Survey |
Type |
Journal Article |
Year |
2021 |
Publication |
Expert Systems With Applications |
Abbreviated Journal |
ESWA |
Volume |
164 |
Issue |
|
Pages |
113794 |
Keywords |
|
Abstract |
Sign language, as a different form of the communication language, is important to large groups of people in society. There are different signs in each sign language with variability in hand shape, motion profile, and position of the hand, face, and body parts contributing to each sign. So, visual sign language recognition is a complex research area in computer vision. Many models have been proposed by different researchers with significant improvement by deep learning approaches in recent years. In this survey, we review the vision-based proposed models of sign language recognition using deep learning approaches from the last five years. While the overall trend of the proposed models indicates a significant improvement in recognition accuracy in sign language recognition, there are some challenges yet that need to be solved. We present a taxonomy to categorize the proposed models for isolated and continuous sign language recognition, discussing applications, datasets, hybrid models, complexity, and future lines of research in the field. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
HUPBA; no proj |
Approved |
no |
Call Number |
Admin @ si @ RKE2021a |
Serial |
3521 |
Permanent link to this record |
|
|
|
Author |
Giuseppe Pezzano; Vicent Ribas Ripoll; Petia Radeva |
Title |
CoLe-CNN: Context-learning convolutional neural network with adaptive loss function for lung nodule segmentation |
Type |
Journal Article |
Year |
2021 |
Publication |
Computer Methods and Programs in Biomedicine |
Abbreviated Journal |
CMPB |
Volume |
198 |
Issue |
|
Pages |
105792 |
Keywords |
|
Abstract |
Background and objective:An accurate segmentation of lung nodules in computed tomography images is a crucial step for the physical characterization of the tumour. Being often completely manually accomplished, nodule segmentation turns to be a tedious and time-consuming procedure and this represents a high obstacle in clinical practice. In this paper, we propose a novel Convolutional Neural Network for nodule segmentation that combines a light and efficient architecture with innovative loss function and segmentation strategy. Methods:In contrast to most of the standard end-to-end architectures for nodule segmentation, our network learns the context of the nodules by producing two masks representing all the background and secondary-important elements in the Computed Tomography scan. The nodule is detected by subtracting the context from the original scan image. Additionally, we introduce an asymmetric loss function that automatically compensates for potential errors in the nodule annotations. We trained and tested our Neural Network on the public LIDC-IDRI database, compared it with the state of the art and run a pseudo-Turing test between four radiologists and the network. Results:The results proved that the behaviour of the algorithm is very near to the human performance and its segmentation masks are almost indistinguishable from the ones made by the radiologists. Our method clearly outperforms the state of the art on CT nodule segmentation in terms of F1 score and IoU of and respectively. Conclusions: The main structure of the network ensures all the properties of the UNet architecture, while the Multi Convolutional Layers give a more accurate pattern recognition. The newly adopted solutions also increase the details on the border of the nodule, even under the noisiest conditions. This method can be applied now for single CT slice nodule segmentation and it represents a starting point for the future development of a fully automatic 3D segmentation software. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MILAB; no proj |
Approved |
no |
Call Number |
Admin @ si @ PRR2021 |
Serial |
3530 |
Permanent link to this record |
|
|
|
Author |
Sudeep Katakol; Basem Elbarashy; Luis Herranz; Joost Van de Weijer; Antonio Lopez |
Title |
Distributed Learning and Inference with Compressed Images |
Type |
Journal Article |
Year |
2021 |
Publication |
IEEE Transactions on Image Processing |
Abbreviated Journal |
TIP |
Volume |
30 |
Issue |
|
Pages |
3069 - 3083 |
Keywords |
|
Abstract |
Modern computer vision requires processing large amounts of data, both while training the model and/or during inference, once the model is deployed. Scenarios where images are captured and processed in physically separated locations are increasingly common (e.g. autonomous vehicles, cloud computing). In addition, many devices suffer from limited resources to store or transmit data (e.g. storage space, channel capacity). In these scenarios, lossy image compression plays a crucial role to effectively increase the number of images collected under such constraints. However, lossy compression entails some undesired degradation of the data that may harm the performance of the downstream analysis task at hand, since important semantic information may be lost in the process. Moreover, we may only have compressed images at training time but are able to use original images at inference time, or vice versa, and in such a case, the downstream model suffers from covariate shift. In this paper, we analyze this phenomenon, with a special focus on vision-based perception for autonomous driving as a paradigmatic scenario. We see that loss of semantic information and covariate shift do indeed exist, resulting in a drop in performance that depends on the compression rate. In order to address the problem, we propose dataset restoration, based on image restoration with generative adversarial networks (GANs). Our method is agnostic to both the particular image compression method and the downstream task; and has the advantage of not adding additional cost to the deployed models, which is particularly important in resource-limited devices. The presented experiments focus on semantic segmentation as a challenging use case, cover a broad range of compression rates and diverse datasets, and show how our method is able to significantly alleviate the negative effects of compression on the downstream visual task. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; ADAS; 600.120; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ KEH2021 |
Serial |
3543 |
Permanent link to this record |
|
|
|
Author |
Carola Figueroa Flores; David Berga; Joost Van de Weijer; Bogdan Raducanu |
Title |
Saliency for free: Saliency prediction as a side-effect of object recognition |
Type |
Journal Article |
Year |
2021 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
Volume |
150 |
Issue |
|
Pages |
1-7 |
Keywords |
Saliency maps; Unsupervised learning; Object recognition |
Abstract |
Saliency is the perceptual capacity of our visual system to focus our attention (i.e. gaze) on relevant objects instead of the background. So far, computational methods for saliency estimation required the explicit generation of a saliency map, process which is usually achieved via eyetracking experiments on still images. This is a tedious process that needs to be repeated for each new dataset. In the current paper, we demonstrate that is possible to automatically generate saliency maps without ground-truth. In our approach, saliency maps are learned as a side effect of object recognition. Extensive experiments carried out on both real and synthetic datasets demonstrated that our approach is able to generate accurate saliency maps, achieving competitive results when compared with supervised methods. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; 600.147; 600.120 |
Approved |
no |
Call Number |
Admin @ si @ FBW2021 |
Serial |
3559 |
Permanent link to this record |
|
|
|
Author |
Henry Velesaca; Patricia Suarez; Raul Mira; Angel Sappa |
Title |
Computer Vision based Food Grain Classification: a Comprehensive Survey |
Type |
Journal Article |
Year |
2021 |
Publication |
Computers and Electronics in Agriculture |
Abbreviated Journal |
CEA |
Volume |
187 |
Issue |
|
Pages |
106287 |
Keywords |
|
Abstract |
This manuscript presents a comprehensive survey on recent computer vision based food grain classification techniques. It includes state-of-the-art approaches intended for different grain varieties. The approaches proposed in the literature are analyzed according to the processing stages considered in the classification pipeline, making it easier to identify common techniques and comparisons. Additionally, the type of images considered by each approach (i.e., images from the: visible, infrared, multispectral, hyperspectral bands) together with the strategy used to generate ground truth data (i.e., real and synthetic images) are reviewed. Finally, conclusions highlighting future needs and challenges are presented. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MSIAU; 600.130; 600.122 |
Approved |
no |
Call Number |
Admin @ si @ VSM2021 |
Serial |
3576 |
Permanent link to this record |
|
|
|
Author |
Daniel Hernandez; Antonio Espinosa; David Vazquez; Antonio Lopez; Juan C. Moure |
Title |
3D Perception With Slanted Stixels on GPU |
Type |
Journal Article |
Year |
2021 |
Publication |
IEEE Transactions on Parallel and Distributed Systems |
Abbreviated Journal |
TPDS |
Volume |
32 |
Issue |
10 |
Pages |
2434-2447 |
Keywords |
Daniel Hernandez-Juarez; Antonio Espinosa; David Vazquez; Antonio M. Lopez; Juan C. Moure |
Abstract |
This article presents a GPU-accelerated software design of the recently proposed model of Slanted Stixels, which represents the geometric and semantic information of a scene in a compact and accurate way. We reformulate the measurement depth model to reduce the computational complexity of the algorithm, relying on the confidence of the depth estimation and the identification of invalid values to handle outliers. The proposed massively parallel scheme and data layout for the irregular computation pattern that corresponds to a Dynamic Programming paradigm is described and carefully analyzed in performance terms. Performance is shown to scale gracefully on current generation embedded GPUs. We assess the proposed methods in terms of semantic and geometric accuracy as well as run-time performance on three publicly available benchmark datasets. Our approach achieves real-time performance with high accuracy for 2048 × 1024 image sizes and 4 × 4 Stixel resolution on the low-power embedded GPU of an NVIDIA Tegra Xavier. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.124; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ HEV2021 |
Serial |
3561 |
Permanent link to this record |
|
|
|
Author |
Jose Luis Gomez; Gabriel Villalonga; Antonio Lopez |
Title |
Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches |
Type |
Journal Article |
Year |
2021 |
Publication |
Sensors |
Abbreviated Journal |
SENS |
Volume |
21 |
Issue |
9 |
Pages |
3185 |
Keywords |
co-training; multi-modality; vision-based object detection; ADAS; self-driving |
Abstract |
Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale as we wish. This data-labeling bottleneck may be intensified due to domain shifts among image sensors, which could force per-sensor data labeling. In this paper, we focus on the use of co-training, a semi-supervised learning (SSL) method, for obtaining self-labeled object bounding boxes (BBs), i.e., the GT to train deep object detectors. In particular, we assess the goodness of multi-modal co-training by relying on two different views of an image, namely, appearance (RGB) and estimated depth (D). Moreover, we compare appearance-based single-modal co-training with multi-modal. Our results suggest that in a standard SSL setting (no domain shift, a few human-labeled data) and under virtual-to-real domain shift (many virtual-world labeled data, no human-labeled data) multi-modal co-training outperforms single-modal. In the latter case, by performing GAN-based domain translation both co-training modalities are on par, at least when using an off-the-shelf depth estimation model not specifically trained on the translated images. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
ADAS; 600.118 |
Approved |
no |
Call Number |
Admin @ si @ GVL2021 |
Serial |
3562 |
Permanent link to this record |
|
|
|
Author |
Shiqi Yang; Kai Wang; Luis Herranz; Joost Van de Weijer |
Title |
On Implicit Attribute Localization for Generalized Zero-Shot Learning |
Type |
Journal Article |
Year |
2021 |
Publication |
IEEE Signal Processing Letters |
Abbreviated Journal |
|
Volume |
28 |
Issue |
|
Pages |
872 - 876 |
Keywords |
|
Abstract |
Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their attribute-based descriptions. Since attributes are often related to specific parts of objects, many recent works focus on discovering discriminative regions. However, these methods usually require additional complex part detection modules or attention mechanisms. In this paper, 1) we show that common ZSL backbones (without explicit attention nor part detection) can implicitly localize attributes, yet this property is not exploited. 2) Exploiting it, we then propose SELAR, a simple method that further encourages attribute localization, surprisingly achieving very competitive generalized ZSL (GZSL) performance when compared with more complex state-of-the-art methods. Our findings provide useful insight for designing future GZSL methods, and SELAR provides an easy to implement yet strong baseline. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; 600.120 |
Approved |
no |
Call Number |
YWH2021 |
Serial |
3563 |
Permanent link to this record |
|
|
|
Author |
Domicele Jonauskaite; Lucia Camenzind; C. Alejandro Parraga; Cecile N Diouf; Mathieu Mercapide Ducommun; Lauriane Müller; Melanie Norberg; Christine Mohr |
Title |
Colour-emotion associations in individuals with red-green colour blindness |
Type |
Journal Article |
Year |
2021 |
Publication |
PeerJ |
Abbreviated Journal |
|
Volume |
9 |
Issue |
|
Pages |
e11180 |
Keywords |
Affect; Chromotherapy; Colour cognition; Colour vision deficiency; Cross-modal correspondences; Daltonism; Deuteranopia; Dichromatic; Emotion; Protanopia. |
Abstract |
Colours and emotions are associated in languages and traditions. Some of us may convey sadness by saying feeling blue or by wearing black clothes at funerals. The first example is a conceptual experience of colour and the second example is an immediate perceptual experience of colour. To investigate whether one or the other type of experience more strongly drives colour-emotion associations, we tested 64 congenitally red-green colour-blind men and 66 non-colour-blind men. All participants associated 12 colours, presented as terms or patches, with 20 emotion concepts, and rated intensities of the associated emotions. We found that colour-blind and non-colour-blind men associated similar emotions with colours, irrespective of whether colours were conveyed via terms (r = .82) or patches (r = .80). The colour-emotion associations and the emotion intensities were not modulated by participants' severity of colour blindness. Hinting at some additional, although minor, role of actual colour perception, the consistencies in associations for colour terms and patches were higher in non-colour-blind than colour-blind men. Together, these results suggest that colour-emotion associations in adults do not require immediate perceptual colour experiences, as conceptual experiences are sufficient. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
CIC; LAMP; 600.120; 600.128 |
Approved |
no |
Call Number |
Admin @ si @ JCP2021 |
Serial |
3564 |
Permanent link to this record |
|
|
|
Author |
Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal |
Title |
DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis |
Type |
Conference Article |
Year |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
Volume |
12823 |
Issue |
|
Pages |
555–568 |
Keywords |
|
Abstract |
Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed DocSynth model learns to generate a set of realistic document images consistent with the defined layout. Also, this framework has been adapted to this work as a superior baseline model for creating synthetic document image datasets for augmenting real data during training for document layout analysis tasks. Different sets of learning objectives have been also used to improve the model performance. Quantitatively, we also compare the generated results of our model with real data using standard evaluation metrics. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects. We also present a comprehensive qualitative analysis summary of the different scopes of synthetic image generation tasks. Lastly, to our knowledge this is the first work of its kind. |
Address |
Lausanne; Suissa; September 2021 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
Call Number |
Admin @ si @ BRL2021a |
Serial |
3573 |
Permanent link to this record |
|
|
|
Author |
Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal |
Title |
Beyond Document Object Detection: Instance-Level Segmentation of Complex Layouts |
Type |
Journal Article |
Year |
2021 |
Publication |
International Journal on Document Analysis and Recognition |
Abbreviated Journal |
IJDAR |
Volume |
24 |
Issue |
|
Pages |
269–281 |
Keywords |
|
Abstract |
Information extraction is a fundamental task of many business intelligence services that entail massive document processing. Understanding a document page structure in terms of its layout provides contextual support which is helpful in the semantic interpretation of the document terms. In this paper, inspired by the progress of deep learning methodologies applied to the task of object recognition, we transfer these models to the specific case of document object detection, reformulating the traditional problem of document layout analysis. Moreover, we importantly contribute to prior arts by defining the task of instance segmentation on the document image domain. An instance segmentation paradigm is especially important in complex layouts whose contents should interact for the proper rendering of the page, i.e., the proper text wrapping around an image. Finally, we provide an extensive evaluation, both qualitative and quantitative, that demonstrates the superior performance of the proposed methodology over the current state of the art. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
Call Number |
Admin @ si @ BRL2021b |
Serial |
3574 |
Permanent link to this record |
|
|
|
Author |
Kai Wang; Joost Van de Weijer; Luis Herranz |
Title |
ACAE-REMIND for online continual learning with compressed feature replay |
Type |
Journal Article |
Year |
2021 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
Volume |
150 |
Issue |
|
Pages |
122-129 |
Keywords |
online continual learning; autoencoders; vector quantization |
Abstract |
Online continual learning aims to learn from a non-IID stream of data from a number of different tasks, where the learner is only allowed to consider data once. Methods are typically allowed to use a limited buffer to store some of the images in the stream. Recently, it was found that feature replay, where an intermediate layer representation of the image is stored (or generated) leads to superior results than image replay, while requiring less memory. Quantized exemplars can further reduce the memory usage. However, a drawback of these methods is that they use a fixed (or very intransigent) backbone network. This significantly limits the learning of representations that can discriminate between all tasks. To address this problem, we propose an auxiliary classifier auto-encoder (ACAE) module for feature replay at intermediate layers with high compression rates. The reduced memory footprint per image allows us to save more exemplars for replay. In our experiments, we conduct task-agnostic evaluation under online continual learning setting and get state-of-the-art performance on ImageNet-Subset, CIFAR100 and CIFAR10 dataset. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
LAMP; 600.147; 601.379; 600.120; 600.141 |
Approved |
no |
Call Number |
Admin @ si @ WWH2021 |
Serial |
3575 |
Permanent link to this record |
|
|
|
Author |
Patricia Suarez; Angel Sappa; Boris X. Vintimilla |
Title |
Deep learning-based vegetation index estimation |
Type |
Book Chapter |
Year |
2021 |
Publication |
Generative Adversarial Networks for Image-to-Image Translation |
Abbreviated Journal |
|
Volume |
|
Issue |
|
Pages |
205-234 |
Keywords |
|
Abstract |
Chapter 9 |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
Elsevier |
Place of Publication |
|
Editor |
A.Solanki; A.Nayyar; M.Naved |
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
|
Notes |
MSIAU; 600.122 |
Approved |
no |
Call Number |
Admin @ si @ SSV2021a |
Serial |
3578 |
Permanent link to this record |