|
Xavier Otazu, Olivier Penacchio, & Xim Cerda-Company. (2015). Brightness and colour induction through contextual influences in V1. In Scottish Vision Group 2015 SGV2015 (Vol. 12, pp. 1208–2012).
|
|
|
Olivier Penacchio, Xavier Otazu, A. wilkins, & J. Harris. (2015). Uncomfortable images prevent lateral interactions in the cortex from providing a sparse code. In European Conference on Visual Perception ECVP2015.
|
|
|
Xavier Otazu, Olivier Penacchio, & Xim Cerda-Company. (2015). An excitatory-inhibitory firing rate model accounts for brightness induction, colour induction and visual discomfort. In Barcelona Computational, Cognitive and Systems Neuroscience.
|
|
|
Adria Ruiz, Joost Van de Weijer, & Xavier Binefa. (2015). From emotions to action units with hidden and semi-hidden-task learning. In 16th IEEE International Conference on Computer Vision (pp. 3703–3711).
Abstract: Limited annotated training data is a challenging problem in Action Unit recognition. In this paper, we investigate how the use of large databases labelled according to the 6 universal facial expressions can increase the generalization ability of Action Unit classifiers. For this purpose, we propose a novel learning framework: Hidden-Task Learning. HTL aims to learn a set of Hidden-Tasks (Action Units)for which samples are not available but, in contrast, training data is easier to obtain from a set of related VisibleTasks (Facial Expressions). To that end, HTL is able to exploit prior knowledge about the relation between Hidden and Visible-Tasks. In our case, we base this prior knowledge on empirical psychological studies providing statistical correlations between Action Units and universal facial expressions. Additionally, we extend HTL to Semi-Hidden Task Learning (SHTL) assuming that Action Unit training samples are also provided. Performing exhaustive experiments over four different datasets, we show that HTL and SHTL improve the generalization ability of AU classifiers by training them with additional facial expression data. Additionally, we show that SHTL achieves competitive performance compared with state-of-the-art Transductive Learning approaches which face the problem of limited training data by using unlabelled test samples during training.
|
|
|
Fahad Shahbaz Khan, Muhammad Anwer Rao, Joost Van de Weijer, Michael Felsberg, & J.Laaksonen. (2015). Deep semantic pyramids for human attributes and action recognition. In Image Analysis, Proceedings of 19th Scandinavian Conference , SCIA 2015 (Vol. 9127, pp. 341–353). Springer International Publishing.
Abstract: Describing persons and their actions is a challenging problem due to variations in pose, scale and viewpoint in real-world images. Recently, semantic pyramids approach [1] for pose normalization has shown to provide excellent results for gender and action recognition. The performance of semantic pyramids approach relies on robust image description and is therefore limited due to the use of shallow local features. In the context of object recognition [2] and object detection [3], convolutional neural networks (CNNs) or deep features have shown to improve the performance over the conventional shallow features.
We propose deep semantic pyramids for human attributes and action recognition. The method works by constructing spatial pyramids based on CNNs of different part locations. These pyramids are then combined to obtain a single semantic representation. We validate our approach on the Berkeley and 27 Human Attributes datasets for attributes classification. For action recognition, we perform experiments on two challenging datasets: Willow and PASCAL VOC 2010. The proposed deep semantic pyramids provide a significant gain of 17.2%, 13.9%, 24.3% and 22.6% compared to the standard shallow semantic pyramids on Berkeley, 27 Human Attributes, Willow and PASCAL VOC 2010 datasets respectively. Our results also show that deep semantic pyramids outperform conventional CNNs based on the full bounding box of the person. Finally, we compare our approach with state-of-the-art methods and show a gain in performance compared to best methods in literature.
Keywords: Action recognition; Human attributes; Semantic pyramids
|
|
|
Aleksandr Setkov, Fabio Martinez Carillo, Michele Gouiffes, Christian Jacquemin, Maria Vanrell, & Ramon Baldrich. (2015). DAcImPro: A Novel Database of Acquired Image Projections and Its Application to Object Recognition. In Advances in Visual Computing. Proceedings of 11th International Symposium, ISVC 2015 Part II (Vol. 9475, pp. 463–473). LNCS. Springer International Publishing.
Abstract: Projector-camera systems are designed to improve the projection quality by comparing original images with their captured projections, which is usually complicated due to high photometric and geometric variations. Many research works address this problem using their own test data which makes it extremely difficult to compare different proposals. This paper has two main contributions. Firstly, we introduce a new database of acquired image projections (DAcImPro) that, covering photometric and geometric conditions and providing data for ground-truth computation, can serve to evaluate different algorithms in projector-camera systems. Secondly, a new object recognition scenario from acquired projections is presented, which could be of a great interest in such domains, as home video projections and public presentations. We show that the task is more challenging than the classical recognition problem and thus requires additional pre-processing, such as color compensation or projection area selection.
Keywords: Projector-camera systems; Feature descriptors; Object recognition
|
|
|
Marc Masana, Joost Van de Weijer, & Andrew Bagdanov. (2016). On-the-fly Network pruning for object detection. In International conference on learning representations.
Abstract: Object detection with deep neural networks is often performed by passing a few
thousand candidate bounding boxes through a deep neural network for each image.
These bounding boxes are highly correlated since they originate from the same
image. In this paper we investigate how to exploit feature occurrence at the image scale to prune the neural network which is subsequently applied to all bounding boxes. We show that removing units which have near-zero activation in the image allows us to significantly reduce the number of parameters in the network. Results on the PASCAL 2007 Object Detection Challenge demonstrate that up to 40% of units in some fully-connected layers can be entirely eliminated with little change in the detection result.
|
|
|
Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes Garcıa-Martinez, Fethi Bougares, et al. (2016). Does Multimodality Help Human and Machine for Translation and Image Captioning? In 1st conference on machine translation.
Abstract: This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate theusefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
|
|
|
Esteve Cervantes, Long Long Yu, Andrew Bagdanov, Marc Masana, & Joost Van de Weijer. (2016). Hierarchical Part Detection with Deep Neural Networks. In 23rd IEEE International Conference on Image Processing.
Abstract: Part detection is an important aspect of object recognition. Most approaches apply object proposals to generate hundreds of possible part bounding box candidates which are then evaluated by part classifiers. Recently several methods have investigated directly regressing to a limited set of bounding boxes from deep neural network representation. However, for object parts such methods may be unfeasible due to their relatively small size with respect to the image. We propose a hierarchical method for object and part detection. In a single network we first detect the object and then regress to part location proposals based only on the feature representation inside the object. Experiments show that our hierarchical approach outperforms a network which directly regresses the part locations. We also show that our approach obtains part detection accuracy comparable or better than state-of-the-art on the CUB-200 bird and Fashionista clothing item datasets with only a fraction of the number of part proposals.
Keywords: Object Recognition; Part Detection; Convolutional Neural Networks
|
|
|
Muhammad Anwer Rao, Fahad Shahbaz Khan, Joost Van de Weijer, & Jorma Laaksonen. (2016). Combining Holistic and Part-based Deep Representations for Computational Painting Categorization. In 6th International Conference on Multimedia Retrieval.
Abstract: Automatic analysis of visual art, such as paintings, is a challenging inter-disciplinary research problem. Conventional approaches only rely on global scene characteristics by encoding holistic information for computational painting categorization.We argue that such approaches are sub-optimal and that discriminative common visual structures provide complementary information for painting classification. We present an approach that encodes both the global scene layout and discriminative latent common structures for computational painting categorization. The region of interests are automatically extracted, without any manual part labeling, by training class-specific deformable part-based models. Both holistic and region-of-interests are then described using multi-scale dense convolutional features. These features are pooled separately using Fisher vector encoding and concatenated afterwards in a single image representation. Experiments are performed on a challenging dataset with 91 different painters and 13 diverse painting styles. Our approach outperforms the standard method, which only employs the global scene characteristics. Furthermore, our method achieves state-of-the-art results outperforming a recent multi-scale deep features based approach [11] by 6.4% and 3.8% respectively on artist and style classification.
|
|
|
Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes Garcia-Martinez, Fethi Bougares, Loic Barrault, et al. (2017). LIUM-CVC Submissions for WMT17 Multimodal Translation Task. In 2nd Conference on Machine Translation.
Abstract: This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.
|
|
|
Ivet Rafegas, & Maria Vanrell. (2016). Color spaces emerging from deep convolutional networks. In 24th Color and Imaging Conference (pp. 225–230).
Abstract: Award for the best interactive session
Defining color spaces that provide a good encoding of spatio-chromatic properties of color surfaces is an open problem in color science [8, 22]. Related to this, in computer vision the fusion of color with local image features has been studied and evaluated [16]. In human vision research, the cells which are selective to specific color hues along the visual pathway are also a focus of attention [7, 14]. In line with these research aims, in this paper we study how color is encoded in a deep Convolutional Neural Network (CNN) that has been trained on more than one million natural images for object recognition. These convolutional nets achieve impressive performance in computer vision, and rival the representations in human brain. In this paper we explore how color is represented in a CNN architecture that can give some intuition about efficient spatio-chromatic representations. In convolutional layers the activation of a neuron is related to a spatial filter, that combines spatio-chromatic representations. We use an inverted version of it to explore the properties. Using a series of unsupervised methods we classify different type of neurons depending on the color axes they define and we propose an index of color-selectivity of a neuron. We estimate the main color axes that emerge from this trained net and we prove that colorselectivity of neurons decreases from early to deeper layers.
|
|
|
Yaxing Wang, L. Zhang, & Joost Van de Weijer. (2016). Ensembles of generative adversarial networks. In 30th Annual Conference on Neural Information Processing Systems Worshops.
Abstract: Ensembles are a popular way to improve results of discriminative CNNs. The
combination of several networks trained starting from different initializations
improves results significantly. In this paper we investigate the usage of ensembles of GANs. The specific nature of GANs opens up several new ways to construct ensembles. The first one is based on the fact that in the minimax game which is played to optimize the GAN objective the generator network keeps on changing even after the network can be considered optimal. As such ensembles of GANs can be constructed based on the same network initialization but just taking models which have different amount of iterations. These so-called self ensembles are much faster to train than traditional ensembles. The second method, called cascade GANs, redirects part of the training data which is badly modeled by the first GAN to another GAN. In experiments on the CIFAR10 dataset we show that ensembles of GANs obtain model probability distributions which better model the data distribution. In addition, we show that these improved results can be obtained at little additional computational cost.
|
|
|
Guim Perarnau, Joost Van de Weijer, Bogdan Raducanu, & Jose Manuel Alvarez. (2016). Invertible conditional gans for image editing. In 30th Annual Conference on Neural Information Processing Systems Worshops.
Abstract: Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes.
Additionally, we evaluate the design of cGANs. The combination of an encoder
with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real
images with deterministic complex modifications.
|
|
|
Aitor Alvarez-Gila, Joost Van de Weijer, & Estibaliz Garrote. (2017). Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB. In 1st International Workshop on Physics Based Vision meets Deep Learning.
Abstract: Hyperspectral signal reconstruction aims at recovering the original spectral input that produced a certain trichromatic (RGB) response from a capturing device or observer.
Given the heavily underconstrained, non-linear nature of the problem, traditional techniques leverage different statistical properties of the spectral signal in order to build informative priors from real world object reflectances for constructing such RGB to spectral signal mapping. However,
most of them treat each sample independently, and thus do not benefit from the contextual information that the spatial dimensions can provide. We pose hyperspectral natural image reconstruction as an image to image mapping learning problem, and apply a conditional generative adversarial framework to help capture spatial semantics. This is the first time Convolutional Neural Networks -and, particularly, Generative Adversarial Networks- are used to solve this task. Quantitative evaluation shows a Root Mean Squared Error (RMSE) drop of 44:7% and a Relative RMSE drop of 47:0% on the ICVL natural hyperspectral image dataset.
|
|
|
Laura Lopez-Fuentes, Andrew Bagdanov, Joost Van de Weijer, & Harald Skinnemoen. (2017). Bandwidth Limited Object Recognition in High Resolution Imagery. In IEEE Winter conference on Applications of Computer Vision.
Abstract: This paper proposes a novel method to optimize bandwidth usage for object detection in critical communication scenarios. We develop two operating models of active information seeking. The first model identifies promising regions in low resolution imagery and progressively requests higher resolution regions on which to perform recognition of higher semantic quality. The second model identifies promising regions in low resolution imagery while simultaneously predicting the approximate location of the object of higher semantic quality. From this general framework, we develop a car recognition system via identification of its license plate and evaluate the performance of both models on a car dataset that we introduce. Results are compared with traditional JPEG compression and demonstrate that our system saves up to one order of magnitude of bandwidth while sacrificing little in terms of recognition performance.
|
|
|
Laura Lopez-Fuentes, Joost Van de Weijer, Marc Bolaños, & Harald Skinnemoen. (2017). Multi-modal Deep Learning Approach for Flood Detection. In MediaEval Benchmarking Initiative for Multimedia Evaluation.
Abstract: In this paper we propose a multi-modal deep learning approach to detect floods in social media posts. Social media posts normally contain some metadata and/or visual information, therefore in order to detect the floods we use this information. The model is based on a Convolutional Neural Network which extracts the visual features and a bidirectional Long Short-Term Memory network to extract the semantic features from the textual metadata. We validate the
method on images extracted from Flickr which contain both visual information and metadata and compare the results when using both, visual information only or metadata only. This work has been done in the context of the MediaEval Multimedia Satellite Task.
|
|
|
Ivet Rafegas, & Maria Vanrell. (2017). Color representation in CNNs: parallelisms with biological vision. In ICCV Workshop on Mutual Benefits ofr Cognitive and Computer Vision.
Abstract: Convolutional Neural Networks (CNNs) trained for object recognition tasks present representational capabilities approaching to primate visual systems [1]. This provides a computational framework to explore how image features
are efficiently represented. Here, we dissect a trained CNN
[2] to study how color is represented. We use a classical methodology used in physiology that is measuring index of selectivity of individual neurons to specific features. We use ImageNet Dataset [20] images and synthetic versions
of them to quantify color tuning properties of artificial neurons to provide a classification of the network population.
We conclude three main levels of color representation showing some parallelisms with biological visual systems: (a) a decomposition in a circular hue space to represent single color regions with a wider hue sampling beyond the first
layer (V2), (b) the emergence of opponent low-dimensional spaces in early stages to represent color edges (V1); and (c) a strong entanglement between color and shape patterns representing object-parts (e.g. wheel of a car), objectshapes (e.g. faces) or object-surrounds configurations (e.g. blue sky surrounding an object) in deeper layers (V4 or IT).
|
|