Marc Masana, Joost Van de Weijer, & Andrew Bagdanov. (2016). On-the-fly Network pruning for object detection. In International conference on learning representations.
Abstract: Object detection with deep neural networks is often performed by passing a few
thousand candidate bounding boxes through a deep neural network for each image.
These bounding boxes are highly correlated since they originate from the same
image. In this paper we investigate how to exploit feature occurrence at the image scale to prune the neural network which is subsequently applied to all bounding boxes. We show that removing units which have near-zero activation in the image allows us to significantly reduce the number of parameters in the network. Results on the PASCAL 2007 Object Detection Challenge demonstrate that up to 40% of units in some fully-connected layers can be entirely eliminated with little change in the detection result.
|
|
Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes Garcıa-Martinez, Fethi Bougares, et al. (2016). Does Multimodality Help Human and Machine for Translation and Image Captioning? In 1st conference on machine translation.
Abstract: This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate theusefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
|
|
Esteve Cervantes, Long Long Yu, Andrew Bagdanov, Marc Masana, & Joost Van de Weijer. (2016). Hierarchical Part Detection with Deep Neural Networks. In 23rd IEEE International Conference on Image Processing.
Abstract: Part detection is an important aspect of object recognition. Most approaches apply object proposals to generate hundreds of possible part bounding box candidates which are then evaluated by part classifiers. Recently several methods have investigated directly regressing to a limited set of bounding boxes from deep neural network representation. However, for object parts such methods may be unfeasible due to their relatively small size with respect to the image. We propose a hierarchical method for object and part detection. In a single network we first detect the object and then regress to part location proposals based only on the feature representation inside the object. Experiments show that our hierarchical approach outperforms a network which directly regresses the part locations. We also show that our approach obtains part detection accuracy comparable or better than state-of-the-art on the CUB-200 bird and Fashionista clothing item datasets with only a fraction of the number of part proposals.
Keywords: Object Recognition; Part Detection; Convolutional Neural Networks
|
|
Jordi Roca. (2012). Constancy and inconstancy in categorical colour perception (Maria Vanrell, & C. Alejandro Parraga, Eds.). Ph.D. thesis, , .
Abstract: To recognise objects is perhaps the most important task an autonomous system, either biological or artificial needs to perform. In the context of human vision, this is partly achieved by recognizing the colour of surfaces despite changes in the wavelength distribution of the illumination, a property called colour constancy. Correct surface colour recognition may be adequately accomplished by colour category matching without the need to match colours precisely, therefore categorical colour constancy is likely to play an important role for object identification to be successful. The main aim of this work is to study the relationship between colour constancy and categorical colour perception. Previous studies of colour constancy have shown the influence of factors such the spatio-chromatic properties of the background, individual observer's performance, semantics, etc. However there is very little systematic study of these influences. To this end, we developed a new approach to colour constancy which includes both individual observers' categorical perception, the categorical structure of the background, and their interrelations resulting in a more comprehensive characterization of the phenomenon. In our study, we first developed a new method to analyse the categorical structure of 3D colour space, which allowed us to characterize individual categorical colour perception as well as quantify inter-individual variations in terms of shape and centroid location of 3D categorical regions. Second, we developed a new colour constancy paradigm, termed chromatic setting, which allows measuring the precise location of nine categorically-relevant points in colour space under immersive illumination. Additionally, we derived from these measurements a new colour constancy index which takes into account the magnitude and orientation of the chromatic shift, memory effects and the interrelations among colours and a model of colour naming tuned to each observer/adaptation state. Our results lead to the following conclusions: (1) There exists large inter-individual variations in the categorical structure of colour space, and thus colour naming ability varies significantly but this is not well predicted by low-level chromatic discrimination ability; (2) Analysis of the average colour naming space suggested the need for an additional three basic colour terms (turquoise, lilac and lime) for optimal colour communication; (3) Chromatic setting improved the precision of more complex linear colour constancy models and suggested that mechanisms other than cone gain might be best suited to explain colour constancy; (4) The categorical structure of colour space is broadly stable under illuminant changes for categorically balanced backgrounds; (5) Categorical inconstancy exists for categorically unbalanced backgrounds thus indicating that categorical information perceived in the initial stages of adaptation may constrain further categorical perception.
|
|
Ivet Rafegas, & Maria Vanrell. (2016). Color spaces emerging from deep convolutional networks. In 24th Color and Imaging Conference (pp. 225–230).
Abstract: Award for the best interactive session
Defining color spaces that provide a good encoding of spatio-chromatic properties of color surfaces is an open problem in color science [8, 22]. Related to this, in computer vision the fusion of color with local image features has been studied and evaluated [16]. In human vision research, the cells which are selective to specific color hues along the visual pathway are also a focus of attention [7, 14]. In line with these research aims, in this paper we study how color is encoded in a deep Convolutional Neural Network (CNN) that has been trained on more than one million natural images for object recognition. These convolutional nets achieve impressive performance in computer vision, and rival the representations in human brain. In this paper we explore how color is represented in a CNN architecture that can give some intuition about efficient spatio-chromatic representations. In convolutional layers the activation of a neuron is related to a spatial filter, that combines spatio-chromatic representations. We use an inverted version of it to explore the properties. Using a series of unsupervised methods we classify different type of neurons depending on the color axes they define and we propose an index of color-selectivity of a neuron. We estimate the main color axes that emerge from this trained net and we prove that colorselectivity of neurons decreases from early to deeper layers.
|
|
Ivet Rafegas, & Maria Vanrell. (2016). Colour Visual Coding in trained Deep Neural Networks. In European Conference on Visual Perception.
|
|
Yaxing Wang, L. Zhang, & Joost Van de Weijer. (2016). Ensembles of generative adversarial networks. In 30th Annual Conference on Neural Information Processing Systems Worshops.
Abstract: Ensembles are a popular way to improve results of discriminative CNNs. The
combination of several networks trained starting from different initializations
improves results significantly. In this paper we investigate the usage of ensembles of GANs. The specific nature of GANs opens up several new ways to construct ensembles. The first one is based on the fact that in the minimax game which is played to optimize the GAN objective the generator network keeps on changing even after the network can be considered optimal. As such ensembles of GANs can be constructed based on the same network initialization but just taking models which have different amount of iterations. These so-called self ensembles are much faster to train than traditional ensembles. The second method, called cascade GANs, redirects part of the training data which is badly modeled by the first GAN to another GAN. In experiments on the CIFAR10 dataset we show that ensembles of GANs obtain model probability distributions which better model the data distribution. In addition, we show that these improved results can be obtained at little additional computational cost.
|
|
Guim Perarnau, Joost Van de Weijer, Bogdan Raducanu, & Jose Manuel Alvarez. (2016). Invertible conditional gans for image editing. In 30th Annual Conference on Neural Information Processing Systems Worshops.
Abstract: Generative Adversarial Networks (GANs) have recently demonstrated to successfully approximate complex data distributions. A relevant extension of this model is conditional GANs (cGANs), where the introduction of external information allows to determine specific representations of the generated images. In this work, we evaluate encoders to inverse the mapping of a cGAN, i.e., mapping a real image into a latent space and a conditional representation. This allows, for example, to reconstruct and modify real images of faces conditioning on arbitrary attributes.
Additionally, we evaluate the design of cGANs. The combination of an encoder
with a cGAN, which we call Invertible cGAN (IcGAN), enables to re-generate real
images with deterministic complex modifications.
|
|
Arash Akbarinia, C. Alejandro Parraga, Marta Exposito, Bogdan Raducanu, & Xavier Otazu. (2017). Can biological solutions help computers detect symmetry? In 40th European Conference on Visual Perception.
|
|
Rada Deeb, Damien Muselet, Mathieu Hebert, Alain Tremeau, & Joost Van de Weijer. (2017). 3D color charts for camera spectral sensitivity estimation. In 28th British Machine Vision Conference.
Abstract: Estimating spectral data such as camera sensor responses or illuminant spectral power distribution from raw RGB camera outputs is crucial in many computer vision applications.
Usually, 2D color charts with various patches of known spectral reflectance are
used as reference for such purpose. Deducing n-D spectral data (n»3) from 3D RGB inputs is an ill-posed problem that requires a high number of inputs. Unfortunately, most of the natural color surfaces have spectral reflectances that are well described by low-dimensional linear models, i.e. each spectral reflectance can be approximated by a weighted sum of the others. It has been shown that adding patches to color charts does not help in practice, because the information they add is redundant with the information provided by the first set of patches. In this paper, we propose to use spectral data of
higher dimensionality by using 3D color charts that create inter-reflections between the surfaces. These inter-reflections produce multiplications between natural spectral curves and so provide non-linear spectral curves. We show that such data provide enough information for accurate spectral data estimation.
|
|
Ivet Rafegas. (2017). Color in Visual Recognition: from flat to deep representations and some biological parallelisms (Maria Vanrell, Ed.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Visual recognition is one of the main problems in computer vision that attempts to solve image understanding by deciding what objects are in images. This problem can be computationally solved by using relevant sets of visual features, such as edges, corners, color or more complex object parts. This thesis contributes to how color features have to be represented for recognition tasks.
Image features can be extracted following two different approaches. A first approach is defining handcrafted descriptors of images which is then followed by a learning scheme to classify the content (named flat schemes in Kruger et al. (2013). In this approach, perceptual considerations are habitually used to define efficient color features. Here we propose a new flat color descriptor based on the extension of color channels to boost the representation of spatio-chromatic contrast that surpasses state-of-the-art approaches. However, flat schemes present a lack of generality far away from the capabilities of biological systems. A second approach proposes evolving these flat schemes into a hierarchical process, like in the visual cortex. This includes an automatic process to learn optimal features. These deep schemes, and more specifically Convolutional Neural Networks (CNNs), have shown an impressive performance to solve various vision problems. However, there is a lack of understanding about the internal representation obtained, as a result of automatic learning. In this thesis we propose a new methodology to explore the internal representation of trained CNNs by defining the Neuron Feature as a visualization of the intrinsic features encoded in each individual neuron. Additionally, and inspired by physiological techniques, we propose to compute different neuron selectivity indexes (e.g., color, class, orientation or symmetry, amongst others) to label and classify the full CNN neuron population to understand learned representations.
Finally, using the proposed methodology, we show an in-depth study on how color is represented on a specific CNN, trained for object recognition, that competes with primate representational abilities (Cadieu et al (2014)). We found several parallelisms with biological visual systems: (a) a significant number of color selectivity neurons throughout all the layers; (b) an opponent and low frequency representation of color oriented edges and a higher sampling of frequency selectivity in brightness than in color in 1st layer like in V1; (c) a higher sampling of color hue in the second layer aligned to observed hue maps in V2; (d) a strong color and shape entanglement in all layers from basic features in shallower layers (V1 and V2) to object and background shapes in deeper layers (V4 and IT); and (e) a strong correlation between neuron color selectivities and color dataset bias.
|
|
Abel Gonzalez-Garcia, Joost Van de Weijer, & Yoshua Bengio. (2018). Image-to-image translation for cross-domain disentanglement. In 32nd Annual Conference on Neural Information Processing Systems.
|
|
Marc Masana, Idoia Ruiz, Joan Serrat, Joost Van de Weijer, & Antonio Lopez. (2018). Metric Learning for Novelty and Anomaly Detection. In 29th British Machine Vision Conference.
Abstract: When neural networks process images which do not resemble the distribution seen during training, so called out-of-distribution images, they often make wrong predictions, and do so too confidently. The capability to detect out-of-distribution images is therefore crucial for many real-world applications. We divide out-of-distribution detection between novelty detection ---images of classes which are not in the training set but are related to those---, and anomaly detection ---images with classes which are unrelated to the training set. By related we mean they contain the same type of objects, like digits in MNIST and SVHN. Most existing work has focused on anomaly detection, and has addressed this problem considering networks trained with the cross-entropy loss. Differently from them, we propose to use metric learning which does not have the drawback of the softmax layer (inherent to cross-entropy methods), which forces the network to divide its prediction power over the learned classes. We perform extensive experiments and evaluate both novelty and anomaly detection, even in a relevant application such as traffic sign recognition, obtaining comparable or better results than previous works.
|
|
Ozan Caglayan, Adrien Bardet, Fethi Bougares, Loic Barrault, Kai Wang, Marc Masana, et al. (2018). LIUM-CVC Submissions for WMT18 Multimodal Translation Task. In 3rd Conference on Machine Translation.
Abstract: This paper describes the multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT18 Shared Task on Multimodal Translation. This year we propose several modifications to our previou multimodal attention architecture in order to better integrate convolutional features and refine them using encoder-side information. Our final constrained submissions
ranked first for English→French and second for English→German language pairs among the constrained submissions according to the automatic evaluation metric METEOR.
|
|
Chenshen Wu, Luis Herranz, Xialei Liu, Joost Van de Weijer, & Bogdan Raducanu. (2018). Memory Replay GANs: Learning to Generate New Categories without Forgetting. In 32nd Annual Conference on Neural Information Processing Systems (pp. 5966–5976).
Abstract: Previous works on sequential learning address the problem of forgetting in discriminative models. In this paper we consider the case of generative models. In particular, we investigate generative adversarial networks (GANs) in the task of learning new categories in a sequential fashion. We first show that sequential fine tuning renders the network unable to properly generate images from previous categories (ie forgetting). Addressing this problem, we propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator. We study two methods to prevent forgetting by leveraging these replays, namely joint training with replay and replay alignment. Qualitative and quantitative experimental results in MNIST, SVHN and LSUN datasets show that our memory replay approach can generate competitive images while significantly mitigating the forgetting of previous categories.
|
|
Bojana Gajic, Ariel Amato, Ramon Baldrich, & Carlo Gatta. (2019). Bag of Negatives for Siamese Architectures. In 30th British Machine Vision Conference.
Abstract: Training a Siamese architecture for re-identification with a large number of identities is a challenging task due to the difficulty of finding relevant negative samples efficiently. In this work we present Bag of Negatives (BoN), a method for accelerated and improved training of Siamese networks that scales well on datasets with a very large number of identities. BoN is an efficient and loss-independent method, able to select a bag of high quality negatives, based on a novel online hashing strategy.
|
|
C. Alejandro Parraga, Xavier Otazu, & Arash Akbarinia. (2019). Modelling symmetry perception with banks of quadrature convolutional Gabor kernels. In 42nd edition of the European Conference on Visual Perception (p. 224).
Abstract: Mirror symmetry is a property most likely to be encountered in animals than in medium scale vegetation or inanimate objects in the natural world. This might be the reason why the human visual system has evolved to detect it quickly and robustly. Indeed, the perception of symmetry assists higher-level visual processing that are crucial for survival such as target recognition and identification irrespective of position and location. Although the task of detecting symmetrical objects seems effortless to us, it is very challenging for computers (to the extent that it has been proposed as a robust “captcha” by Funk & Liu in 2016). Indeed, the exact mechanism of symmetry detection in primates is not well understood: fMRI studies have shown that symmetrical shapes activate specific higher-level areas of the visual cortex (Sasaki et al.; 2005) and similarly, a large body of psychophysical experiments suggest that the symmetry perception is critically influenced by low-level mechanisms (Treder; 2010). In this work we attempt to find plausible low-level mechanisms that might form the basis for symmetry perception. Our simple model is made from banks of (i) odd-symmetric Gabors (resembling edge-detecting V1 neurons); and (ii) banks of larger odd- and even-symmetric Gabors (resembling higher visual cortex neurons), that pool signals from the 'edge image'. As reported previously (Akbarinia et al, ECVP2017), the convolution of the symmetrical lines with the two Gabor kernels of alternative phase produces a minimum in one and a maximum in the other (Osorio; 1996), and the rectification and combination of these signals create lines which hint of mirror symmetry in natural images. We improved the algorithm by combining these signals across several spatial scales. Our preliminary results suggest that such multiscale combination of convolutional operations might form the basis for much of the operation of the HVS in terms of symmetry detection and representation.
|
|
David Berga, & Xavier Otazu. (2019). Computations of inhibition of return mechanisms by modulating V1 dynamics. In 28th Annual Computational Neuroscience Meeting.
Abstract: In this study we present a unifed model of the visual cortex for predicting visual attention using real image scenes. Feedforward mechanisms from RGC and LGN have been functionally modeled using wavelet filters at distinct orientations and scales for each chromatic pathway (Magno-, Parvo-, Konio-cellular) and polarity (ON-/OFF-center), by processing image components in the CIE Lab space. In V1, we process cortical interactions with an excitatory-inhibitory network of fring rate neurons, initially proposed by (Li, 1999), later extended by (Penacchio et al. 2013). Firing rates from model’s output have been used as predictors of neuronal activity to be projected in a map in superior colliculus (with WTA-like computations), determining locations of visual fxations. These locations will be considered as already visited areas for future saccades, therefore we integrated a spatiotemporal function of inhibition of return mechanisms (where LIP/FEF is responsible) to feed to the model with spatial memory for next saccades. Foveation mechanisms have been simulated with a cortical magnifcation function, which distort spatial viewing properties for each fxation. Results show lower prediction errors than with respect no IoR cases (Fig. 1), and it is functionally consistent with human psychophysical measurements. Our model follows a biologically-constrained architecture, previously shown to reproduce visual saliency (Berga & Otazu, 2018), visual discomfort (Penacchio et al. 2016), brightness (Penacchio et al. 2013) and chromatic induction (Cerda & Otazu, 2016).
|
|