Olivier Penacchio, Xavier Otazu, & Laura Dempere-Marco. (2013). A Neurodynamical Model of Brightness Induction in V1. Plos - PloS ONE, 8(5), e64086.
Abstract: Brightness induction is the modulation of the perceived intensity of an area by the luminance of surrounding areas. Recent neurophysiological evidence suggests that brightness information might be explicitly represented in V1, in contrast to the more common assumption that the striate cortex is an area mostly responsive to sensory information. Here we investigate possible neural mechanisms that offer a plausible explanation for such phenomenon. To this end, a neurodynamical model which is based on neurophysiological evidence and focuses on the part of V1 responsible for contextual influences is presented. The proposed computational model successfully accounts for well known psychophysical effects for static contexts and also for brightness induction in dynamic contexts defined by modulating the luminance of surrounding areas. This work suggests that intra-cortical interactions in V1 could, at least partially, explain brightness induction effects and reveals how a common general architecture may account for several different fundamental processes, such as visual saliency and brightness induction, which emerge early in the visual processing pathway.
|
Laura Lopez-Fuentes, Joost Van de Weijer, Marc Bolaños, & Harald Skinnemoen. (2017). Multi-modal Deep Learning Approach for Flood Detection. In MediaEval Benchmarking Initiative for Multimedia Evaluation.
Abstract: In this paper we propose a multi-modal deep learning approach to detect floods in social media posts. Social media posts normally contain some metadata and/or visual information, therefore in order to detect the floods we use this information. The model is based on a Convolutional Neural Network which extracts the visual features and a bidirectional Long Short-Term Memory network to extract the semantic features from the textual metadata. We validate the
method on images extracted from Flickr which contain both visual information and metadata and compare the results when using both, visual information only or metadata only. This work has been done in the context of the MediaEval Multimedia Satellite Task.
|
Jon Almazan, Bojana Gajic, Naila Murray, & Diane Larlus. (2018). Re-ID done right: towards good practices for person re-identification.
Abstract: Training a deep architecture using a ranking loss has become standard for the person re-identification task. Increasingly, these deep architectures include additional components that leverage part detections, attribute predictions, pose estimators and other auxiliary information, in order to more effectively localize and align discriminative image regions. In this paper we adopt a different approach and carefully design each component of a simple deep architecture and, critically, the strategy for training it effectively for person re-identification. We extensively evaluate each design choice, leading to a list of good practices for person re-identification. By following these practices, our approach outperforms the state of the art, including more complex methods with auxiliary components, by large margins on four benchmark datasets. We also provide a qualitative analysis of our trained representation which indicates that, while compact, it is able to capture information from localized and discriminative regions, in a manner akin to an implicit attention mechanism.
|
O. Fors, J. Nuñez, Xavier Otazu, A. Prades, & Robert D. Cardinal. (2010). Improving the Ability of Image Sensors to Detect Faint Stars and Moving Objects Using Image Deconvolution Techniques. SENS - Sensors, 10(3), 1743–1752.
Abstract: Abstract: In this paper we show how the techniques of image deconvolution can increase the ability of image sensors as, for example, CCD imagers, to detect faint stars or faint orbital objects (small satellites and space debris). In the case of faint stars, we show that this benefit is equivalent to double the quantum efficiency of the used image sensor or to increase the effective telescope aperture by more than 30% without decreasing the astrometric precision or introducing artificial bias. In the case of orbital objects, the deconvolution technique can double the signal-to-noise ratio of the image, which helps to discover and control dangerous objects as space debris or lost satellites. The benefits obtained using CCD detectors can be extrapolated to any kind of image sensors.
Keywords: image processing; image deconvolution; faint stars; space debris; wavelet transform
|
Javier Vazquez, C. Alejandro Parraga, Maria Vanrell, & Ramon Baldrich. (2009). Color Constancy Algorithms: Psychophysical Evaluation on a New Dataset. Journal of Imaging Science and Technology, 53(3), 031105–9.
Abstract: The estimation of the illuminant of a scene from a digital image has been the goal of a large amount of research in computer vision. Color constancy algorithms have dealt with this problem by defining different heuristics to select a unique solution from within the feasible set. The performance of these algorithms has shown that there is still a long way to go to globally solve this problem as a preliminary step in computer vision. In general, performance evaluation has been done by comparing the angular error between the estimated chromaticity and the chromaticity of a canonical illuminant, which is highly dependent on the image dataset. Recently, some workers have used high-level constraints to estimate illuminants; in this case selection is based on increasing the performance on the subsequent steps of the systems. In this paper we propose a new performance measure, the perceptual angular error. It evaluates the performance of a color constancy algorithm according to the perceptual preferences of humans, or naturalness (instead of the actual optimal solution) and is independent of the visual task. We show the results of a new psychophysical experiment comparing solutions from three different color constancy algorithms. Our results show that in more than a half of the judgments the preferred solution is not the one closest to the optimal solution. Our experiments were performed on a new dataset of images acquired with a calibrated camera with an attached neutral grey sphere, which better copes with the illuminant variations of the scene.
|
Graham D. Finlayson, Javier Vazquez, & Fufu Fang. (2021). The Discrete Cosine Maximum Ignorance Assumption. In 29th Color and Imaging Conference (pp. 13–18).
Abstract: the performance of colour correction algorithms are dependent on the reflectance sets used. Sometimes, when the testing reflectance set is changed the ranking of colour correction algorithms also changes. To remove dependence on dataset we can
make assumptions about the set of all possible reflectances. In the Maximum Ignorance with Positivity (MIP) assumption we assume that all reflectances with per wavelength values between 0 and 1 are equally likely. A weakness in the MIP is that it fails to take into account the correlation of reflectance functions between
wavelengths (many of the assumed reflectances are, in reality, not possible).
In this paper, we take the view that the maximum ignorance assumption has merit but, hitherto it has been calculated with respect to the wrong coordinate basis. Here, we propose the Discrete Cosine Maximum Ignorance assumption (DCMI), where
all reflectances that have coordinates between max and min bounds in the Discrete Cosine Basis coordinate system are equally likely.
Here, the correlation between wavelengths is encoded and this results in the set of all plausible reflectances ’looking like’ typical reflectances that occur in nature. This said the DCMI model is also a superset of all measured reflectance sets.
Experiments show that, in colour correction, adopting the DCMI results in similar colour correction performance as using a particular reflectance set.
|
Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes Garcia-Martinez, Fethi Bougares, Loic Barrault, et al. (2017). LIUM-CVC Submissions for WMT17 Multimodal Translation Task. In 2nd Conference on Machine Translation.
Abstract: This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.
|
Xim Cerda-Company, & Xavier Otazu. (2019). Color induction in equiluminant flashed stimuli. JOSA A - Journal of the Optical Society of America A, 36(1), 22–31.
Abstract: Color induction is the influence of the surrounding color (inducer) on the perceived color of a central region. There are two different types of color induction: color contrast (the color of the central region shifts away from that of the inducer) and color assimilation (the color shifts towards the color of the inducer). Several studies on these effects have used uniform and striped surrounds, reporting color contrast and color assimilation, respectively. Other authors [J. Vis. 12(1), 22 (2012) [CrossRef] ] have studied color induction using flashed uniform surrounds, reporting that the contrast is higher for shorter flash duration. Extending their study, we present new psychophysical results using both flashed and static (i.e., non-flashed) equiluminant stimuli for both striped and uniform surrounds. Similarly to them, for uniform surround stimuli we observed color contrast, but we did not obtain the maximum contrast for the shortest (10 ms) flashed stimuli, but for 40 ms. We only observed this maximum contrast for red, green, and lime inducers, while for a purple inducer we obtained an asymptotic profile along the flash duration. For striped stimuli, we observed color assimilation only for the static (infinite flash duration) red–green surround inducers (red first inducer, green second inducer). For the other inducers’ configurations, we observed color contrast or no induction. Since other studies showed that non-equiluminant striped static stimuli induce color assimilation, our results also suggest that luminance differences could be a key factor to induce it.
|
Eduard Vazquez, Theo Gevers, M. Lucassen, Joost Van de Weijer, & Ramon Baldrich. (2010). Saliency of Color Image Derivatives: A Comparison between Computational Models and Human Perception. JOSA A - Journal of the Optical Society of America A, 27(3), 613–621.
Abstract: In this paper, computational methods are proposed to compute color edge saliency based on the information content of color edges. The computational methods are evaluated on bottom-up saliency in a psychophysical experiment, and on a more complex task of salient object detection in real-world images. The psychophysical experiment demonstrates the relevance of using information theory as a saliency processing model and that the proposed methods are significantly better in predicting color saliency (with a human-method correspondence up to 74.75% and an observer agreement of 86.8%) than state-of-the-art models. Furthermore, results from salient object detection confirm that an early fusion of color and contrast provide accurate performance to compute visual saliency with a hit rate up to 95.2%.
|
David Berga, Xavier Otazu, Xose R. Fernandez-Vidal, Victor Leboran, & Xose M. Pardo. (2019). Generating Synthetic Images for Visual Attention Modeling. PER - Perception, 48, 99.
|
C. Alejandro Parraga, Robert Benavente, & Maria Vanrell. (2010). Towards a general model of colour categorization which considers context. PER - Perception. ECVP Abstract Supplement, 39, 86.
Abstract: In two previous experiments [Parraga et al, 2009 J. of Im. Sci. and Tech 53(3) 031106; Benavente et al,2009 Perception 38 ECVP Supplement, 36] the boundaries of basic colour categories were measured.
In the first experiment, samples were presented in isolation (ie on a dark background) and boundaries were measured using a yes/no paradigm. In the second, subjects adjusted the chromaticity of a sample presented on a random Mondrian background to find the boundary between pairs of adjacent colours.
Results from these experiments showed significant dierences but it was not possible to conclude whether this discrepancy was due to the absence/presence of a colourful background or to the dierences in the paradigms used. In this work, we settle this question by repeating the first experiment (ie samples presented on a dark background) using the second paradigm. A comparison of results shows that
although boundary locations are very similar, boundaries measured in context are significantly dierent(more diuse) than those measured in isolation (confirmed by a Student’s t-test analysis on the subject’s answers statistical distributions). In addition, we completed the mapping of colour name space by measuring the boundaries between chromatic colours and the achromatic centre. With these results we
completed our parametric fuzzy-sets model of colour naming space.
|
Xavier Otazu, & Xim Cerda-Company. (2022). The contribution of luminance and chromatic channels to color assimilation. JOV - Journal of Vision, 22(6)(10), 1–15.
Abstract: Color induction is the phenomenon where the physical and the perceived colors of an object differ owing to the color distribution and the spatial configuration of the surrounding objects. Previous works studying this phenomenon on the lsY MacLeod–Boynton color space, show that color assimilation is present only when the magnocellular pathway (i.e., the Y axis) is activated (i.e., when there are luminance differences). Concretely, the authors showed that the effect is mainly induced by the koniocellular pathway (s axis), but not by the parvocellular pathway (l axis), suggesting that when magnocellular pathway is activated it inhibits the koniocellular pathway. In the present work, we study whether parvo-, konio-, and magnocellular pathways may influence on each other through the color induction effect. Our results show that color assimilation does not depend on a chromatic–chromatic interaction, and that chromatic assimilation is driven by the interaction between luminance and chromatic channels (mainly the magno- and the koniocellular pathways). Our results also show that chromatic induction is greatly decreased when all three visual pathways are simultaneously activated, and that chromatic pathways could influence each other through the magnocellular (luminance) pathway. In addition, we observe that chromatic channels can influence the luminance channel, hence inducing a small brightness induction. All these results show that color induction is a highly complex process where interactions between the several visual pathways are yet unknown and should be studied in greater detail.
|
Xim Cerda-Company, Xavier Otazu, Nilai Sallent, & C. Alejandro Parraga. (2018). The effect of luminance differences on color assimilation. JV - Journal of Vision, 18(11), 10.
Abstract: The color appearance of a surface depends on the color of its surroundings (inducers). When the perceived color shifts towards that of the surroundings, the effect is called “color assimilation” and when it shifts away from the surroundings it is called “color contrast.” There is also evidence that the phenomenon depends on the spatial configuration of the inducer, e.g., uniform surrounds tend to induce color contrast and striped surrounds tend to induce color assimilation. However, previous work found that striped surrounds under certain conditions do not induce color assimilation but induce color contrast (or do not induce anything at all), suggesting that luminance differences and high spatial frequencies could be key factors in color assimilation. Here we present a new psychophysical study of color assimilation where we assessed the contribution of luminance differences (between the target and its surround) present in striped stimuli. Our results show that luminance differences are key factors in color assimilation for stimuli varying along the s axis of MacLeod-Boynton color space, but not for stimuli varying along the l axis. This asymmetry suggests that koniocellular neural mechanisms responsible for color assimilation only contribute when there is a luminance difference, supporting the idea that mutual-inhibition has a major role in color induction.
|
Jordi Roca, C. Alejandro Parraga, & Maria Vanrell. (2013). Chromatic settings and the structural color constancy index. JV - Journal of Vision, 13(4-3), 1–26.
Abstract: Color constancy is usually measured by achromatic setting, asymmetric matching, or color naming paradigms, whose results are interpreted in terms of indexes and models that arguably do not capture the full complexity of the phenomenon. Here we propose a new paradigm, chromatic setting, which allows a more comprehensive characterization of color constancy through the measurement of multiple points in color space under immersive adaptation. We demonstrated its feasibility by assessing the consistency of subjects' responses over time. The paradigm was applied to two-dimensional (2-D) Mondrian stimuli under three different illuminants, and the results were used to fit a set of linear color constancy models. The use of multiple colors improved the precision of more complex linear models compared to the popular diagonal model computed from gray. Our results show that a diagonal plus translation matrix that models mechanisms other than cone gain might be best suited to explain the phenomenon. Additionally, we calculated a number of color constancy indices for several points in color space, and our results suggest that interrelations among colors are not as uniform as previously believed. To account for this variability, we developed a new structural color constancy index that takes into account the magnitude and orientation of the chromatic shift in addition to the interrelations among colors and memory effects.
|
Xavier Otazu, Olivier Penacchio, & Laura Dempere-Marco. (2012). Brightness induction by contextual influences in V1: a neurodynamical account. In Journal of Vision (Vol. 12).
Abstract: Brightness induction is the modulation of the perceived intensity of an area by the luminance of surrounding areas and reveals fundamental properties of neural organization in the visual system. Several phenomenological models have been proposed that successfully account for psychophysical data (Pessoa et al. 1995, Blakeslee and McCourt 2004, Barkan et al. 2008, Otazu et al. 2008).
Neurophysiological evidence suggests that brightness information is explicitly represented in V1 and neuronal response modulations have been observed followingluminance changes outside their receptive fields (Rossi and Paradiso, 1999).
In this work we investigate possible neural mechanisms that offer a plausible explanation for such effects. To this end, we consider the model by Z.Li (1999) which is based on biological data and focuses on the part of V1 responsible for contextual influences, namely, layer 2–3 pyramidal cells, interneurons, and horizontal intracortical connections. This model has proven to account for phenomena such as contour detection and preattentive segmentation, which share with brightness induction the relevant effect of contextual influences. In our model, the input to the network is derived from a complete multiscale and multiorientation wavelet decomposition which makes it possible to recover an image reflecting the perceived intensity. The proposed model successfully accounts for well known pyschophysical effects (among them: the White's and modified White's effects, the Todorović, Chevreul, achromatic ring patterns, and grating induction effects). Our work suggests that intra-cortical interactions in the primary visual cortex could partially explain perceptual brightness induction effects and reveals how a common general architecture may account for several different fundamental processes emerging early in the visual pathway.
|
David Berga, & Xavier Otazu. (2022). A neurodynamic model of saliency prediction in v1. NEURALCOMPUT - Neural Computation, 34(2), 378–414.
Abstract: Lateral connections in the primary visual cortex (V1) have long been hypothesized to be responsible for several visual processing mechanisms such as brightness induction, chromatic induction, visual discomfort, and bottom-up visual attention (also named saliency). Many computational models have been developed to independently predict these and other visual processes, but no computational model has been able to reproduce all of them simultaneously. In this work, we show that a biologically plausible computational model of lateral interactions of V1 is able to simultaneously predict saliency and all the aforementioned visual processes. Our model's architecture (NSWAM) is based on Penacchio's neurodynamic model of lateral connections of V1. It is defined as a network of firing rate neurons, sensitive to visual features such as brightness, color, orientation, and scale. We tested NSWAM saliency predictions using images from several eye tracking data sets. We show that the accuracy of predictions obtained by our architecture, using shuffled metrics, is similar to other state-of-the-art computational methods, particularly with synthetic images (CAT2000-Pattern and SID4VAM) that mainly contain low-level features. Moreover, we outperform other biologically inspired saliency models that are specifically designed to exclusively reproduce saliency. We show that our biologically plausible model of lateral connections can simultaneously explain different visual processes present in V1 (without applying any type of training or optimization and keeping the same parameterization for all the visual processes). This can be useful for the definition of a unified architecture of the primary visual cortex.
|
Muhammad Anwer Rao, Fahad Shahbaz Khan, Joost Van de Weijer, & Jorma Laaksonen. (2017). Tex-Nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition. In 19th International Conference on Multimodal Interaction.
Abstract: Recognizing materials and textures in realistic imaging conditions is a challenging computer vision problem. For many years, local features based orderless representations were a dominant approach for texture recognition. Recently deep local features, extracted from the intermediate layers of a Convolutional Neural Network (CNN), are used as filter banks. These dense local descriptors from a deep model, when encoded with Fisher Vectors, have shown to provide excellent results for texture recognition. The CNN models, employed in such approaches, take RGB patches as input and train on a large amount of labeled images. We show that CNN models, which we call TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard deep models trained on RGB patches. We further investigate two deep architectures, namely early and late fusion, to combine the texture and color information. Experiments on benchmark texture datasets clearly demonstrate that TEX-Nets provide complementary information to standard RGB deep network. Our approach provides a large gain of 4.8%, 3.5%, 2.6% and 4.1% respectively in accuracy on the DTD, KTH-TIPS-2a, KTH-TIPS-2b and Texture-10 datasets, compared to the standard RGB network of the same architecture. Further, our final combination leads to consistent improvements over the state-of-the-art on all four datasets.
Keywords: Convolutional Neural Networks; Texture Recognition; Local Binary Paterns
|
Muhammad Anwer Rao, Fahad Shahbaz Khan, Joost Van de Weijer, & Jorma Laaksonen. (2016). Combining Holistic and Part-based Deep Representations for Computational Painting Categorization. In 6th International Conference on Multimedia Retrieval.
Abstract: Automatic analysis of visual art, such as paintings, is a challenging inter-disciplinary research problem. Conventional approaches only rely on global scene characteristics by encoding holistic information for computational painting categorization.We argue that such approaches are sub-optimal and that discriminative common visual structures provide complementary information for painting classification. We present an approach that encodes both the global scene layout and discriminative latent common structures for computational painting categorization. The region of interests are automatically extracted, without any manual part labeling, by training class-specific deformable part-based models. Both holistic and region-of-interests are then described using multi-scale dense convolutional features. These features are pooled separately using Fisher vector encoding and concatenated afterwards in a single image representation. Experiments are performed on a challenging dataset with 91 different painters and 13 diverse painting styles. Our approach outperforms the standard method, which only employs the global scene characteristics. Furthermore, our method achieves state-of-the-art results outperforming a recent multi-scale deep features based approach [11] by 6.4% and 3.8% respectively on artist and style classification.
|