Home | << 1 >> |
Records | |||||
---|---|---|---|---|---|
Author | Felipe Codevilla; Antonio Lopez; Vladlen Koltun; Alexey Dosovitskiy | ||||
Title | On Offline Evaluation of Vision-based Driving Models | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision | Abbreviated Journal | |
Volume | 11219 | Issue | Pages | 246-262 | |
Keywords | Autonomous driving; deep learning | ||||
Abstract | Autonomous driving models should ideally be evaluated by deploying
them on a fleet of physical vehicles in the real world. Unfortunately, this approach is not practical for the vast majority of researchers. An attractive alternative is to evaluate models offline, on a pre-collected validation dataset with ground truth annotation. In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. We find that offline prediction error is not necessarily correlated with driving quality, and two models with identical prediction error can differ dramatically in their driving performance. We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and suitable offline metrics. |
||||
Address | Munich; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | ADAS; 600.124; 600.118 | Approved | no | ||
Call Number | Admin @ si @ CLK2018 | Serial | 3162 | ||
Permanent link to this record | |||||
Author | Ciprian Corneanu; Meysam Madadi; Sergio Escalera | ||||
Title | Deep Structure Inference Network for Facial Action Unit Recognition | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision | Abbreviated Journal | |
Volume | 11216 | Issue | Pages | 309-324 | |
Keywords | Computer Vision; Machine Learning; Deep Learning; Facial Expression Analysis; Facial Action Units; Structure Inference | ||||
Abstract | Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for general facial expression analysis. Recently, efforts in automatic AU recognition have been dedicated to learning combinations of local features and to exploiting correlations between AUs. We propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3% and 8.2% performance on BP4D and DISFA datasets, respectively. | ||||
Address | Munich; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ CME2018 | Serial | 3205 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Andres Mafla; Marçal Rusiñol; Dimosthenis Karatzas | ||||
Title | Single Shot Scene Text Retrieval | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision | Abbreviated Journal | |
Volume | 11218 | Issue | Pages | 728-744 | |
Keywords | Image retrieval; Scene text; Word spotting; Convolutional Neural Networks; Region Proposals Networks; PHOC | ||||
Abstract | Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image
database. Our experiments demonstrate that the proposed architecture outperforms previous state-of-the-art while it offers a significant increase in processing speed. |
||||
Address | Munich; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | DAG; 600.084; 601.338; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ GMR2018 | Serial | 3143 | ||
Permanent link to this record | |||||
Author | Marc Oliu; Javier Selva; Sergio Escalera | ||||
Title | Folded Recurrent Neural Networks for Future Video Prediction | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision | Abbreviated Journal | |
Volume | 11218 | Issue | Pages | 745-761 | |
Keywords | |||||
Abstract | Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures. This work introduces bijective Gated Recurrent Units, a double mapping between the input and output of a GRU layer. This allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and helping to prevent capacity problems. We show how with this topology only the encoder or decoder needs to be applied for input encoding and prediction, respectively. This reduces the computational cost and avoids re-encoding the predictions when generating a sequence of frames, mitigating the propagation of errors. Furthermore, it is possible to remove layers from an already trained model, giving an insight to the role performed by each layer and making the model more explainable. We evaluate our approach on three video datasets, outperforming state of the art prediction results on MMNIST and UCF101, and obtaining competitive results on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach. | ||||
Address | Munich; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | HUPBA; no menciona | Approved | no | ||
Call Number | Admin @ si @ OSE2018 | Serial | 3204 | ||
Permanent link to this record | |||||
Author | Pau Rodriguez; Josep M. Gonfaus; Guillem Cucurull; Xavier Roca; Jordi Gonzalez | ||||
Title | Attend and Rectify: A Gated Attention Mechanism for Fine-Grained Recovery | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision | Abbreviated Journal | |
Volume | 11212 | Issue | Pages | 357-372 | |
Keywords | Deep Learning; Convolutional Neural Networks; Attention | ||||
Abstract | We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. It learns to attend to lower-level feature activations without requiring part annotations and uses these activations to update and rectify the output likelihood distribution. In contrast to other approaches, the proposed mechanism is modular, architecture-independent and efficient both in terms of parameters and computation required. Experiments show that networks augmented with our approach systematically improve their classification accuracy and become more robust to clutter. As a result, Wide Residual Networks augmented with our proposal surpasses the state of the art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford dogs, and UEC Food-100. | ||||
Address | Munich; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | ISE; 600.098; 602.121; 600.119 | Approved | no | ||
Call Number | Admin @ si @ RGC2018 | Serial | 3139 | ||
Permanent link to this record | |||||
Author | Yaxing Wang; Chenshen Wu; Luis Herranz; Joost Van de Weijer; Abel Gonzalez-Garcia; Bogdan Raducanu | ||||
Title | Transferring GANs: generating images from limited data | Type | Conference Article | ||
Year | 2018 | Publication | 15th European Conference on Computer Vision | Abbreviated Journal | |
Volume | 11210 | Issue | Pages | 220-236 | |
Keywords | Generative adversarial networks; Transfer learning; Domain adaptation; Image generation | ||||
Abstract | ransferring knowledge of pre-trained networks to new domains by means of fine-tuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several aspects of domain adaptation, including the impact of target domain size, the relative distance between source and target domain, and the initialization of conditional GANs. Our results show that using knowledge from pre-trained networks can shorten the convergence time and can significantly improve the quality of the generated images, especially when target data is limited. We show that these conclusions can also be drawn for conditional GANs even when the pre-trained model was trained without conditioning. Our results also suggest that density is more important than diversity and a dataset with one or few densely sampled classes is a better source model than more diverse datasets such as ImageNet or Places. | ||||
Address | Munich; September 2018 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | ECCV | ||
Notes | LAMP; 600.109; 600.106; 600.120 | Approved | no | ||
Call Number | Admin @ si @ WWH2018a | Serial | 3130 | ||
Permanent link to this record |