toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Lei Kang; Marçal Rusiñol; Alicia Fornes; Pau Riba; Mauricio Villegas edit   pdf
url  doi
openurl 
  Title Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition Type Conference Article
  Year 2020 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.  
  Address Aspen; Colorado; USA; March 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 600.129; 600.140; 601.302; 601.312; 600.121 Approved no  
  Call Number Admin @ si @ KRF2020 Serial 3446  
Permanent link to this record
 

 
Author Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Exploring Hate Speech Detection in Multimodal Publications Type Conference Article
  Year 2020 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research.  
  Address Aspen; March 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ GGG2020a Serial 3280  
Permanent link to this record
 

 
Author Edgar Riba; D. Mishkin; Daniel Ponsa; E. Rublee; G. Bradski edit   pdf
url  doi
openurl 
  Title Kornia: an Open Source Differentiable Computer Vision Library for PyTorch Type Conference Article
  Year 2020 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract  
  Address Aspen; Colorado; USA; March 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes MSIAU; 600.122; 600.130 Approved no  
  Call Number Admin @ si @ RMP2020 Serial 3291  
Permanent link to this record
 

 
Author Sergio Escalera; Ralf Herbrich edit  url
doi  isbn
openurl 
  Title The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations Type Book Whole
  Year 2020 Publication The Springer Series on Challenges in Machine Learning Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor Sergio Escalera; Ralf Hebrick  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 2520-1328 ISBN 978-3-030-29134-1 Medium  
  Area Expedition Conference  
  Notes HuPBA; no menciona Approved no  
  Call Number Admin @ si @ HeE2020 Serial 3328  
Permanent link to this record
 

 
Author Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features Type Conference Article
  Year 2020 Publication IEEE Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.  
  Address Aspen; Colorado; USA; March 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ MDB2020 Serial 3334  
Permanent link to this record
 

 
Author Arnau Baro; Alicia Fornes; Carles Badal edit   pdf
openurl 
  Title Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism Type Conference Article
  Year 2020 Publication 17th International Conference on Frontiers in Handwriting Recognition Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks.  
  Address Virtual ICFHR; September 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICFHR  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ BFB2020 Serial 3448  
Permanent link to this record
 

 
Author Alicia Fornes; Josep Llados; Joana Maria Pujadas-Mora edit  url
isbn  openurl
  Title Browsing of the Social Network of the Past: Information Extraction from Population Manuscript Images Type Book Chapter
  Year 2020 Publication Handwritten Historical Document Analysis, Recognition, and Retrieval – State of the Art and Future Trends Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher World Scientific Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-981-120-323-7 Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ FLP2020 Serial 3350  
Permanent link to this record
 

 
Author David Berga; Xavier Otazu edit  openurl
  Title Computations of top-down attention by modulating V1 dynamics Type Conference Article
  Year 2020 Publication Computational and Mathematical Models in Vision Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract  
  Address St. Pete Beach; Florida; May 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference MODVIS  
  Notes NEUROBIT Approved no  
  Call Number Admin @ si @ BeO2020a Serial 3376  
Permanent link to this record
 

 
Author Yaxing Wang edit  isbn
openurl 
  Title Transferring and Learning Representations for Image Generation and Translation Type Book Whole
  Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract Image generation is arguably one of the most attractive, compelling, and challenging tasks in computer vision. Among the methods which perform image generation, generative adversarial networks (GANs) play a key role. The most common image generation models based on GANs can be divided into two main approaches. The first one, called simply image generation takes random noise as an input and synthesizes an image which follows the same distribution as the images in the training set. The second class, which is called image-to-image translation, aims to map an image from a source domain to one that is indistinguishable from those in the target domain. Image-to-image translation methods can further be divided into paired and unpaired image-to-image translation based on whether they require paired data or not. In this thesis, we aim to address some challenges of both image generation and image-to-image generation.GANs highly rely upon having access to vast quantities of data, and fail to generate realistic images from random noise when applied to domains with few images. To address this problem, we aim to transfer knowledge from a model trained on a large dataset (source domain) to the one learned on limited data (target domain). We find that both GANs andconditional GANs can benefit from models trained on large datasets. Our experiments show that transferring the discriminator is more important than the generator. Using both the generator and discriminator results in the best performance. We found, however, that this method suffers from overfitting, since we update all parameters to adapt to the target data. We propose a novel architecture, which is tailored to address knowledge transfer to very small target domains. Our approach effectively exploreswhich part of the latent space is more related to the target domain. Additionally, the proposed method is able to transfer knowledge from multiple pretrained GANs. Although image-to-image translation has achieved outstanding performance, it still facesseveral problems. First, for translation between complex domains (such as translations between different modalities) image-to-image translation methods require paired data. We show that when only some of the pairwise translations have been seen (i.e. during training), we can infer the remaining unseen translations (where training pairs are not available). We propose a new approach where we align multiple encoders and decoders in such a way that the desired translation can be obtained by simply cascadingthe source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). Second, we address the issue of bias in image-to-image translation. Biased datasets unavoidably contain undesired changes, which are dueto the fact that the target dataset has a particular underlying visual distribution. We use carefully designed semantic constraints to reduce the effects of the bias. The semantic constraint aims to enforce the preservation of desired image properties. Finally, current approaches fail to generate diverse outputs or perform scalable image transfer in a single model. To alleviate this problem, we propose a scalable and diverse image-to-image translation. We employ random noise to control the diversity. The scalabitlity is determined by conditioning the domain label.computer vision, deep learning, imitation learning, adversarial generative networks, image generation, image-to-image translation.  
  Address January 2020  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Abel Gonzalez;Luis Herranz  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-121011-5-7 Medium  
  Area Expedition Conference  
  Notes LAMP; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ Wan2020 Serial 3397  
Permanent link to this record
 

 
Author Sangeeth Reddy; Minesh Mathew; Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar edit   pdf
openurl 
  Title RoadText-1K: Text Detection and Recognition Dataset for Driving Videos Type Conference Article
  Year 2020 Publication IEEE International Conference on Robotics and Automation Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled keeping text in mind. This paper introduces a new ”RoadText-1K” dataset for text in driving videos. The dataset is 20 times larger than the existing largest dataset for text in videos. Our dataset comprises 1000 video clips of driving without any bias towards text and with annotations for text bounding boxes and transcriptions in every frame. State of the art methods for text detection,
recognition and tracking are evaluated on the new dataset and the results signify the challenges in unconstrained driving videos compared to existing datasets. This suggests that RoadText-1K is suited for research and development of reading systems, robust enough to be incorporated into more complex downstream tasks like driver assistance and self-driving. The dataset can be found at http://cvit.iiit.ac.in/research/
projects/cvit-projects/roadtext-1k
 
  Address Paris; Francia; ???  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICRA  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ RMG2020 Serial 3400  
Permanent link to this record
 

 
Author Vacit Oguz Yazici; Abel Gonzalez-Garcia; Arnau Ramisa; Bartlomiej Twardowski; Joost Van de Weijer edit   pdf
url  openurl
  Title Orderless Recurrent Models for Multi-label Classification Type Conference Article
  Year 2020 Publication 33rd IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract Recurrent neural networks (RNN) are popular for many computer vision tasks, including multi-label classification. Since RNNs produce sequential outputs, labels need to be ordered for the multi-label classification task. Current approaches sort labels according to their frequency, typically ordering them in either rare-first or frequent-first. These imposed orderings do not take into account that the natural order to generate the labels can change for each image, e.g.\ first the dominant object before summing up the smaller objects in the image. Therefore, in this paper, we propose ways to dynamically order the ground truth labels with the predicted label sequence. This allows for the faster training of more optimal LSTM models for multi-label classification. Analysis evidences that our method does not suffer from duplicate generation, something which is common for other models. Furthermore, it outperforms other CNN-RNN models, and we show that a standard architecture of an image encoder and language decoder trained with our proposed loss obtains the state-of-the-art results on the challenging MS-COCO, WIDER Attribute and PA-100K and competitive results on NUS-WIDE.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPR  
  Notes LAMP; 600.109; 601.309; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ YGR2020 Serial 3408  
Permanent link to this record
 

 
Author Xialei Liu; Chenshen Wu; Mikel Menta; Luis Herranz; Bogdan Raducanu; Andrew Bagdanov; Shangling Jui; Joost Van de Weijer edit   pdf
openurl 
  Title Generative Feature Replay for Class-Incremental Learning Type Conference Article
  Year 2020 Publication CLVISION – Workshop on Continual Learning in Computer Vision Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract Humans are capable of learning new tasks without forgetting previous ones, while neural networks fail due to catastrophic forgetting between new and previously-learned tasks. We consider a class-incremental setting which means that the task-ID is unknown at inference time. The imbalance between old and new classes typically results in a bias of the network towards the newest ones. This imbalance problem can either be addressed by storing exemplars from previous tasks, or by using image replay methods. However, the latter can only be applied to toy datasets since image generation for complex datasets is a hard problem.
We propose a solution to the imbalance problem based on generative feature replay which does not require any exemplars. To do this, we split the network into two parts: a feature extractor and a classifier. To prevent forgetting, we combine generative feature replay in the classifier with feature distillation in the feature extractor. Through feature generation, our method reduces the complexity of generative replay and prevents the imbalance problem. Our approach is computationally efficient and scalable to large datasets. Experiments confirm that our approach achieves state-of-the-art results on CIFAR-100 and ImageNet, while requiring only a fraction of the storage needed for exemplar-based continual learning
 
  Address Virtual CVPR  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes LAMP; 601.309; 602.200; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ LWM2020 Serial 3419  
Permanent link to this record
 

 
Author Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas edit   pdf
openurl 
  Title Location Sensitive Image Retrieval and Tagging Type Conference Article
  Year 2020 Publication 16th European Conference on Computer Vision Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging.  
  Address Virtual; August 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCV  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ GGG2020b Serial 3420  
Permanent link to this record
 

 
Author Yaxing Wang; Abel Gonzalez-Garcia; David Berga; Luis Herranz; Fahad Shahbaz Khan; Joost Van de Weijer edit   pdf
openurl 
  Title MineGAN: effective knowledge transfer from GANs to target domains with few images Type Conference Article
  Year 2020 Publication 33rd IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract One of the attractive characteristics of deep neural networks is their ability to transfer knowledge obtained in one domain to other related domains. As a result, high-quality networks can be trained in domains with relatively little training data. This property has been extensively studied for discriminative networks but has received significantly less attention for generative models. Given the often enormous effort required to train GANs, both computationally as well as in the dataset collection, the re-use of pretrained GANs is a desirable objective. We propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods such as mode collapse and lack of flexibility. We perform experiments on several complex datasets using various GAN architectures (BigGAN, Progressive GAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs.  
  Address Virtual CVPR  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPR  
  Notes LAMP; 600.109; 600.141; 600.120 Approved no  
  Call Number Admin @ si @ WGB2020 Serial 3421  
Permanent link to this record
 

 
Author Lu Yu; Bartlomiej Twardowski; Xialei Liu; Luis Herranz; Kai Wang; Yongmai Cheng; Shangling Jui; Joost Van de Weijer edit   pdf
openurl 
  Title Semantic Drift Compensation for Class-Incremental Learning of Embeddings Type Conference Article
  Year 2020 Publication 33rd IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages (up)  
  Keywords  
  Abstract Class-incremental learning of deep networks sequentially increases the number of classes to be classified. During training, the network has only access to data of one task at a time, where each task contains several classes. In this setting, networks suffer from catastrophic forgetting which refers to the drastic drop in performance on previous tasks. The vast majority of methods have studied this scenario for classification networks, where for each new task the classification layer of the network must be augmented with additional weights to make room for the newly added classes. Embedding networks have the advantage that new classes can be naturally included into the network without adding new weights. Therefore, we study incremental learning for embedding networks. In addition, we propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars. We approximate the drift of previous tasks based on the drift that is experienced by current task data. We perform experiments on fine-grained datasets, CIFAR100 and ImageNet-Subset. We demonstrate that embedding networks suffer significantly less from catastrophic forgetting. We outperform existing methods which do not require exemplars and obtain competitive results compared to methods which store exemplars. Furthermore, we show that our proposed SDC when combined with existing methods to prevent forgetting consistently improves results.  
  Address Virtual CVPR  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPR  
  Notes LAMP; 600.141; 601.309; 602.200; 600.120 Approved no  
  Call Number Admin @ si @ YTL2020 Serial 3422  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: