toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Abel Gonzalez-Garcia; Joost Van de Weijer; Yoshua Bengio edit   pdf
openurl 
  Title Image-to-image translation for cross-domain disentanglement Type Conference Article
  Year 2018 Publication 32nd Annual Conference on Neural Information Processing Systems Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address (up) Montreal; Canada; December 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference NIPS  
  Notes LAMP; 600.120 Approved no  
  Call Number Admin @ si @ GWB2018 Serial 3155  
Permanent link to this record
 

 
Author Chenshen Wu; Luis Herranz; Xialei Liu; Joost Van de Weijer; Bogdan Raducanu edit   pdf
openurl 
  Title Memory Replay GANs: Learning to Generate New Categories without Forgetting Type Conference Article
  Year 2018 Publication 32nd Annual Conference on Neural Information Processing Systems Abbreviated Journal  
  Volume Issue Pages 5966-5976  
  Keywords  
  Abstract Previous works on sequential learning address the problem of forgetting in discriminative models. In this paper we consider the case of generative models. In particular, we investigate generative adversarial networks (GANs) in the task of learning new categories in a sequential fashion. We first show that sequential fine tuning renders the network unable to properly generate images from previous categories (ie forgetting). Addressing this problem, we propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator. We study two methods to prevent forgetting by leveraging these replays, namely joint training with replay and replay alignment. Qualitative and quantitative experimental results in MNIST, SVHN and LSUN datasets show that our memory replay approach can generate competitive images while significantly mitigating the forgetting of previous categories.  
  Address (up) Montreal; Canada; December 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference NIPS  
  Notes LAMP; 600.106; 600.109; 602.200; 600.120 Approved no  
  Call Number Admin @ si @ WHL2018 Serial 3249  
Permanent link to this record
 

 
Author Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Learning to Learn from Web Data through Deep Semantic Embeddings Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Workshops Abbreviated Journal  
  Volume 11134 Issue Pages 514-529  
  Keywords  
  Abstract In this paper we propose to learn a multimodal image and text embedding from Web and Social Media data, aiming to leverage the semantic knowledge learnt in the text domain and transfer it to a visual model for semantic image retrieval. We demonstrate that the pipeline can learn from images with associated text without supervision and perform a thourough analysis of five different text embeddings in three different benchmarks. We show that the embeddings learnt with Web and Social Media data have competitive performances over supervised methods in the text based image retrieval task, and we clearly outperform state of the art in the MIRFlickr dataset when training in the target data. Further we demonstrate how semantic multimodal image retrieval can be performed using the learnt embeddings, going beyond classical instance-level retrieval problems. Finally, we present a new dataset, InstaCities1M, composed by Instagram images and their associated texts that can be used for fair comparison of image-text embeddings.  
  Address (up) Munich; Alemanya; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCVW  
  Notes DAG; 600.129; 601.338; 600.121 Approved no  
  Call Number Admin @ si @ GGG2018a Serial 3175  
Permanent link to this record
 

 
Author Dena Bazazian; Dimosthenis Karatzas; Andrew Bagdanov edit   pdf
openurl 
  Title Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images Type Conference Article
  Year 2018 Publication International Workshop on Egocentric Perception, Interaction and Computing at ECCV Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Word spotting in natural scene images has many applications in scene understanding and visual assistance. We propose Soft-PHOC, an intermediate representation of images based on character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We show how to use our descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time Warping (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera.  
  Address (up) Munich; Alemanya; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCVW  
  Notes DAG; 600.129; 600.121; Approved no  
  Call Number Admin @ si @ BKB2018b Serial 3174  
Permanent link to this record
 

 
Author Raul Gomez; Lluis Gomez; Jaume Gibert; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Learning from# Barcelona Instagram data what Locals and Tourists post about its Neighbourhoods Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Workshops Abbreviated Journal  
  Volume 11134 Issue Pages 530-544  
  Keywords  
  Abstract Massive tourism is becoming a big problem for some cities, such as Barcelona, due to its concentration in some neighborhoods. In this work we gather Instagram data related to Barcelona consisting on images-captions pairs and, using the text as a supervisory signal, we learn relations between images, words and neighborhoods. Our goal is to learn which visual elements appear in photos when people is posting about each neighborhood. We perform a language separate treatment of the data and show that it can be extrapolated to a tourists and locals separate analysis, and that tourism is reflected in Social Media at a neighborhood level. The presented pipeline allows analyzing the differences between the images that tourists and locals associate to the different neighborhoods. The proposed method, which can be extended to other cities or subjects, proves that Instagram data can be used to train multi-modal (image and text) machine learning models that are useful to analyze publications about a city at a neighborhood level. We publish the collected dataset, InstaBarcelona and the code used in the analysis.  
  Address (up) Munich; Alemanya; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCVW  
  Notes DAG; 600.129; 601.338; 600.121 Approved no  
  Call Number Admin @ si @ GGG2018b Serial 3176  
Permanent link to this record
 

 
Author Yaxing Wang; Chenshen Wu; Luis Herranz; Joost Van de Weijer; Abel Gonzalez-Garcia; Bogdan Raducanu edit   pdf
url  openurl
  Title Transferring GANs: generating images from limited data Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11210 Issue Pages 220-236  
  Keywords Generative adversarial networks; Transfer learning; Domain adaptation; Image generation  
  Abstract ransferring knowledge of pre-trained networks to new domains by means of fine-tuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several aspects of domain adaptation, including the impact of target domain size, the relative distance between source and target domain, and the initialization of conditional GANs. Our results show that using knowledge from pre-trained networks can shorten the convergence time and can significantly improve the quality of the generated images, especially when target data is limited. We show that these conclusions can also be drawn for conditional GANs even when the pre-trained model was trained without conditioning. Our results also suggest that density is more important than diversity and a dataset with one or few densely sampled classes is a better source model than more diverse datasets such as ImageNet or Places.  
  Address (up) Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCV  
  Notes LAMP; 600.109; 600.106; 600.120 Approved no  
  Call Number Admin @ si @ WWH2018a Serial 3130  
Permanent link to this record
 

 
Author Pau Rodriguez; Josep M. Gonfaus; Guillem Cucurull; Xavier Roca; Jordi Gonzalez edit   pdf
url  openurl
  Title Attend and Rectify: A Gated Attention Mechanism for Fine-Grained Recovery Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11212 Issue Pages 357-372  
  Keywords Deep Learning; Convolutional Neural Networks; Attention  
  Abstract We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. It learns to attend to lower-level feature activations without requiring part annotations and uses these activations to update and rectify the output likelihood distribution. In contrast to other approaches, the proposed mechanism is modular, architecture-independent and efficient both in terms of parameters and computation required. Experiments show that networks augmented with our approach systematically improve their classification accuracy and become more robust to clutter. As a result, Wide Residual Networks augmented with our proposal surpasses the state of the art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford dogs, and UEC Food-100.  
  Address (up) Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCV  
  Notes ISE; 600.098; 602.121; 600.119 Approved no  
  Call Number Admin @ si @ RGC2018 Serial 3139  
Permanent link to this record
 

 
Author Lluis Gomez; Andres Mafla; Marçal Rusiñol; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Single Shot Scene Text Retrieval Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11218 Issue Pages 728-744  
  Keywords Image retrieval; Scene text; Word spotting; Convolutional Neural Networks; Region Proposals Networks; PHOC  
  Abstract Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture
outperforms previous state-of-the-art while it offers a significant increase
in processing speed.
 
  Address (up) Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCV  
  Notes DAG; 600.084; 601.338; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ GMR2018 Serial 3143  
Permanent link to this record
 

 
Author Felipe Codevilla; Antonio Lopez; Vladlen Koltun; Alexey Dosovitskiy edit   pdf
url  openurl
  Title On Offline Evaluation of Vision-based Driving Models Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11219 Issue Pages 246-262  
  Keywords Autonomous driving; deep learning  
  Abstract Autonomous driving models should ideally be evaluated by deploying
them on a fleet of physical vehicles in the real world. Unfortunately, this approach is not practical for the vast majority of researchers. An attractive alternative is to evaluate models offline, on a pre-collected validation dataset with ground truth annotation. In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. We find that offline prediction error is not necessarily correlated with driving quality, and two models with identical prediction error can differ dramatically in their driving performance. We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and
suitable offline metrics.
 
  Address (up) Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCV  
  Notes ADAS; 600.124; 600.118 Approved no  
  Call Number Admin @ si @ CLK2018 Serial 3162  
Permanent link to this record
 

 
Author Marc Oliu; Javier Selva; Sergio Escalera edit   pdf
url  openurl
  Title Folded Recurrent Neural Networks for Future Video Prediction Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11218 Issue Pages 745-761  
  Keywords  
  Abstract Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures. This work introduces bijective Gated Recurrent Units, a double mapping between the input and output of a GRU layer. This allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and helping to prevent capacity problems. We show how with this topology only the encoder or decoder needs to be applied for input encoding and prediction, respectively. This reduces the computational cost and avoids re-encoding the predictions when generating a sequence of frames, mitigating the propagation of errors. Furthermore, it is possible to remove layers from an already trained model, giving an insight to the role performed by each layer and making the model more explainable. We evaluate our approach on three video datasets, outperforming state of the art prediction results on MMNIST and UCF101, and obtaining competitive results on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach.  
  Address (up) Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCV  
  Notes HUPBA; no menciona Approved no  
  Call Number Admin @ si @ OSE2018 Serial 3204  
Permanent link to this record
 

 
Author Ciprian Corneanu; Meysam Madadi; Sergio Escalera edit   pdf
url  openurl
  Title Deep Structure Inference Network for Facial Action Unit Recognition Type Conference Article
  Year 2018 Publication 15th European Conference on Computer Vision Abbreviated Journal  
  Volume 11216 Issue Pages 309-324  
  Keywords Computer Vision; Machine Learning; Deep Learning; Facial Expression Analysis; Facial Action Units; Structure Inference  
  Abstract Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for general facial expression analysis. Recently, efforts in automatic AU recognition have been dedicated to learning combinations of local features and to exploiting correlations between AUs. We propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3% and 8.2% performance on BP4D and DISFA datasets, respectively.  
  Address (up) Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCV  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ CME2018 Serial 3205  
Permanent link to this record
 

 
Author Mohamed Ilyes Lakhal; Albert Clapes; Sergio Escalera; Oswald Lanz; Andrea Cavallaro edit   pdf
url  openurl
  Title Residual Stacked RNNs for Action Recognition Type Conference Article
  Year 2018 Publication 9th International Workshop on Human Behavior Understanding Abbreviated Journal  
  Volume Issue Pages 534-548  
  Keywords Action recognition; Deep residual learning; Two-stream RNN  
  Abstract Action recognition pipelines that use Recurrent Neural Networks (RNN) are currently 5–10% less accurate than Convolutional Neural Networks (CNN). While most works that use RNNs employ a 2D CNN on each frame to extract descriptors for action recognition, we extract spatiotemporal features from a 3D CNN and then learn the temporal relationship of these descriptors through a stacked residual recurrent neural network (Res-RNN). We introduce for the first time residual learning to counter the degradation problem in multi-layer RNNs, which have been successful for temporal aggregation in two-stream action recognition pipelines. Finally, we use a late fusion strategy to combine RGB and optical flow data of the two-stream Res-RNN. Experimental results show that the proposed pipeline achieves competitive results on UCF-101 and state of-the-art results for RNN-like architectures on the challenging HMDB-51 dataset.  
  Address (up) Munich; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECCVW  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ LCE2018b Serial 3206  
Permanent link to this record
 

 
Author Vacit Oguz Yazici; Joost Van de Weijer; Arnau Ramisa edit   pdf
url  openurl
  Title Color Naming for Multi-Color Fashion Items Type Conference Article
  Year 2018 Publication 6th World Conference on Information Systems and Technologies Abbreviated Journal  
  Volume 747 Issue Pages 64-73  
  Keywords Deep learning; Color; Multi-label  
  Abstract There exists a significant amount of research on color naming of single colored objects. However in reality many fashion objects consist of multiple colors. Currently, searching in fashion datasets for multi-colored objects can be a laborious task. Therefore, in this paper we focus on color naming for images with multi-color fashion items. We collect a dataset, which consists of images which may have from one up to four colors. We annotate the images with the 11 basic colors of the English language. We experiment with several designs for deep neural networks with different losses. We show that explicitly estimating the number of colors in the fashion item leads to improved results.  
  Address (up) Naples; March 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WORLDCIST  
  Notes LAMP; 600.109; 601.309; 600.120 Approved no  
  Call Number Admin @ si @ YWR2018 Serial 3161  
Permanent link to this record
 

 
Author Marc Masana; Idoia Ruiz; Joan Serrat; Joost Van de Weijer; Antonio Lopez edit   pdf
openurl 
  Title Metric Learning for Novelty and Anomaly Detection Type Conference Article
  Year 2018 Publication 29th British Machine Vision Conference Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract When neural networks process images which do not resemble the distribution seen during training, so called out-of-distribution images, they often make wrong predictions, and do so too confidently. The capability to detect out-of-distribution images is therefore crucial for many real-world applications. We divide out-of-distribution detection between novelty detection ---images of classes which are not in the training set but are related to those---, and anomaly detection ---images with classes which are unrelated to the training set. By related we mean they contain the same type of objects, like digits in MNIST and SVHN. Most existing work has focused on anomaly detection, and has addressed this problem considering networks trained with the cross-entropy loss. Differently from them, we propose to use metric learning which does not have the drawback of the softmax layer (inherent to cross-entropy methods), which forces the network to divide its prediction power over the learned classes. We perform extensive experiments and evaluate both novelty and anomaly detection, even in a relevant application such as traffic sign recognition, obtaining comparable or better results than previous works.  
  Address (up) Newcastle; uk; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference BMVC  
  Notes LAMP; ADAS; 601.305; 600.124; 600.106; 602.200; 600.120; 600.118 Approved no  
  Call Number Admin @ si @ MRS2018 Serial 3156  
Permanent link to this record
 

 
Author Cristina Palmero; Javier Selva; Mohammad Ali Bagheri; Sergio Escalera edit   pdf
openurl 
  Title Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues Type Conference Article
  Year 2018 Publication 29th British Machine Vision Conference Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Gaze behavior is an important non-verbal cue in social signal processing and humancomputer interaction. In this paper, we tackle the problem of person- and head poseindependent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in still images. Then, we exploit the dynamic nature of gaze by feeding the learned features of all the frames in a sequence to a many-to-one recurrent module that predicts the 3D gaze vector of the last frame. Our multi-modal static solution is evaluated on a wide range of head poses and gaze directions, achieving a significant improvement of 14.6% over the state of the art on
EYEDIAP dataset, further improved by 4% when the temporal modality is included.
 
  Address (up) Newcastle; UK; September 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference BMVC  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ PSB2018 Serial 3208  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: