|
Marc Bolaños, Alvaro Peris, Francisco Casacuberta, & Petia Radeva. (2017). VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering. In 8th Iberian Conference on Pattern Recognition and Image Analysis.
Abstract: In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consumption. We validate our method on the VQA challenge dataset and compare it to the top performing methods in order to illustrate its performance and speed.
Keywords: Visual Qestion Aswering; Convolutional Neural Networks; Long short-term memory networks
|
|
|
Hana Jarraya, Oriol Ramos Terrades, & Josep Llados. (2017). Graph Embedding through Probabilistic Graphical Model applied to Symbolic Graphs. In 8th Iberian Conference on Pattern Recognition and Image Analysis.
Abstract: We propose a new Graph Embedding (GEM) method that takes advantages of structural pattern representation. It models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector. This vector is a signature of AG in a lower dimensional vectorial space. We apply Structured Support Vector Machines (SSVM) to process classification task. As first tentative, results on the GREC dataset are encouraging enough to go further on this direction.
Keywords: Attributed Graph; Probabilistic Graphical Model; Graph Embedding; Structured Support Vector Machines
|
|
|
Umut Guclu, Yagmur Gucluturk, Meysam Madadi, Sergio Escalera, Xavier Baro, Jordi Gonzalez, et al. (2017). End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks.
Abstract: arXiv:1703.03305
Recent years have seen a sharp increase in the number of related yet distinct advances in semantic segmentation. Here, we tackle this problem by leveraging the respective strengths of these advances. That is, we formulate a conditional random field over a four-connected graph as end-to-end trainable convolutional and recurrent networks, and estimate them via an adversarial process. Importantly, our model learns not only unary potentials but also pairwise
potentials, while aggregating multi-scale contexts and controlling higher-order inconsistencies.
We evaluate our model on two standard benchmark datasets for semantic face segmentation, achieving state-of-the-art results on both of them.
|
|
|
Mireia Sole, Joan Blanco, Debora Gil, G. Fonseka, Richard Frodsham, Francesca Vidal, et al. (2017). Noves perspectives en l estudi de la territorialitat cromosomica de cel·lules germinals masculines: estudis tridimensionals. JBR - Biologia de la Reproduccio, 73–78.
Abstract: In somatic cells, chromosomes occupy specific nuclear regions called chromosome territories which are involved in the
maintenance and regulation of the genome. Preliminary data in male germ cells also suggest the importance of chromosome
territoriality in cell functionality. Nevertheless, the specific characteristics of testicular tissue (presence of different
cell types with different morphological characteristics, in different stages of development and with different ploidy)
makes difficult to achieve conclusive results. In this study we have developed a methodology to approach the threedimensional
study of all chromosome territories in male germ cells from C57BL/6J mice (Mus musculus). The method
includes the following steps: i) Optimized cell fixation to obtain an optimal preservation of the three-dimensionality cell
morphology, ii) Chromosome identification by FISH (Chromoprobe Multiprobe® OctoChrome™ Murine System; Cytocell)
and confocal microscopy (TCS-SP5, Leica Microsystems), iii) Cell type identification by immunofluorescence
iv) Image analysis using Matlab scripts, v) Numerical data extraction related to chromosome features, chromosome
radial position and chromosome relative position. This methodology allows the unequivocally identification and the
analysis of the chromosome territories of all spermatogenic stages. Results will provide information about the features
that determine chromosomal position, preferred associations between chromosomes, and the relationship between chromosome
positioning and genome regulation.
|
|
|
Jun Wan, Sergio Escalera, Gholamreza Anbarjafari, Hugo Jair Escalante, Xavier Baro, Isabelle Guyon, et al. (2017). Results and Analysis of ChaLearn LAP Multi-modal Isolated and ContinuousGesture Recognition, and Real versus Fake Expressed Emotions Challenges. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV.
Abstract: We analyze the results of the 2017 ChaLearn Looking at People Challenge at ICCV. The challenge comprised three tracks: (1) large-scale isolated (2) continuous gesture recognition, and (3) real versus fake expressed emotions tracks. It is the second round for both gesture recognition challenges, which were held first in the context of the ICPR 2016 workshop on “multimedia challenges beyond visual analysis”. In this second round, more participants joined the competitions, and the performances considerably improved compared to the first round. Particularly, the best recognition accuracy of isolated gesture recognition has improved from 56.90% to 67.71% in the IsoGD test set, and Mean Jaccard Index (MJI) of continuous gesture recognition has improved from 0.2869 to 0.6103 in the ConGD test set. The third track is the first challenge on real versus fake expressed emotion classification, including six emotion categories, for which a novel database was introduced. The first place was shared between two teams who achieved 67.70% averaged recognition rate on the test set. The data of the three tracks, the participants' code and method descriptions are publicly available to allow researchers to keep making progress in the field.
|
|
|
Yagmur Gucluturk, Umut Guclu, Marc Perez, Hugo Jair Escalante, Xavier Baro, Isabelle Guyon, et al. (2017). Visualizing Apparent Personality Analysis with Deep Residual Networks. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV (pp. 3101–3109).
Abstract: Automatic prediction of personality traits is a subjective task that has recently received much attention. Specifically, automatic apparent personality trait prediction from multimodal data has emerged as a hot topic within the filed of computer vision and, more particularly, the so called “looking
at people” sub-field. Considering “apparent” personality traits as opposed to real ones considerably reduces the subjectivity of the task. The real world applications are encountered in a wide range of domains, including entertainment, health, human computer interaction, recruitment and security. Predictive models of personality traits are useful for individuals in many scenarios (e.g., preparing for job interviews, preparing for public speaking). However, these predictions in and of themselves might be deemed to be untrustworthy without human understandable supportive evidence. Through a series of experiments on a recently released benchmark dataset for automatic apparent personality trait prediction, this paper characterizes the audio and
visual information that is used by a state-of-the-art model while making its predictions, so as to provide such supportive evidence by explaining predictions made. Additionally, the paper describes a new web application, which gives feedback on apparent personality traits of its users by combining
model predictions with their explanations.
|
|
|
Maryam Asadi-Aghbolaghi, Hugo Bertiche, Vicent Roig, Shohreh Kasaei, & Sergio Escalera. (2017). Action Recognition from RGB-D Data: Comparison and Fusion of Spatio-temporal Handcrafted Features and Deep Strategies. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV.
|
|
|
Albert Clapes, Tinne Tuytelaars, & Sergio Escalera. (2017). Darwintrees for action recognition. In Chalearn Workshop on Action, Gesture, and Emotion Recognition: Large Scale Multimodal Gesture Recognition and Real versus Fake expressed emotions at ICCV.
|
|
|
Katerine Diaz, Konstantia Georgouli, Anastasios Koidis, & Jesus Martinez del Rincon. (2017). Incremental model learning for spectroscopy-based food analysis. CILS - Chemometrics and Intelligent Laboratory Systems, 167, 123–131.
Abstract: In this paper we propose the use of incremental learning for creating and improving multivariate analysis models in the field of chemometrics of spectral data. As main advantages, our proposed incremental subspace-based learning allows creating models faster, progressively improving previously created models and sharing them between laboratories and institutions without requiring transferring or disclosing individual spectra samples. In particular, our approach allows to improve the generalization and adaptability of previously generated models with a few new spectral samples to be applicable to real-world situations. The potential of our approach is demonstrated using vegetable oil type identification based on spectroscopic data as case study. Results show how incremental models maintain the accuracy of batch learning methodologies while reducing their computational cost and handicaps.
Keywords: Incremental model learning; IGDCV technique; Subspace based learning; IdentificationVegetable oils; FT-IR spectroscopy
|
|
|
Jean-Pascal Jacob, Mariella Dimiccoli, & L. Moisan. (2017). Active skeleton for bacteria modelling. CMBBE - Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 5(4), 274–286.
Abstract: The investigation of spatio-temporal dynamics of bacterial cells and their molecular components requires automated image analysis tools to track cell shape properties and molecular component locations inside the cells. In the study of bacteria aging, the molecular components of interest are protein aggregates accumulated near bacteria boundaries. This particular location makes very ambiguous the correspondence between aggregates and cells, since computing accurately bacteria boundaries in phase-contrast time-lapse imaging is a challenging task. This paper proposes an active skeleton formulation for bacteria modelling which provides several advantages: an easy computation of shape properties (perimeter, length, thickness and orientation), an improved boundary accuracy in noisy images and a natural bacteria-centred coordinate system that permits the intrinsic location of molecular components inside the cell. Starting from an initial skeleton estimate, the medial axis of the bacterium is obtained by minimising an energy function which incorporates bacteria shape constraints. Experimental results on biological images and comparative evaluation of the performances validate the proposed approach for modelling cigar-shaped bacteria like Escherichia coli. The Image-J plugin of the proposed method can be found online at http://fluobactracker.inrialpes.fr.
|
|
|
Mariella Dimiccoli, Marc Bolaños, Estefania Talavera, Maedeh Aghaei, Stavri G. Nikolov, & Petia Radeva. (2017). SR-Clustering: Semantic Regularized Clustering for Egocentric Photo Streams Segmentation. CVIU - Computer Vision and Image Understanding, 155, 55–69.
Abstract: While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming processes. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments. First, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, by integrating language processing, a vocabulary of concepts is defined in a semantic space. Finally, by exploiting the temporal coherence in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from activity and event recognition to semantic indexing and summarization. Experiments over egocentric sets of nearly 17,000 images, show that the proposed approach outperforms state-of-the-art methods.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, Sergio Escalera, Huamin Ren, Thomas B. Moeslund, & Elham Etemad. (2017). Locality Regularized Group Sparse Coding for Action Recognition. CVIU - Computer Vision and Image Understanding, 158, 106–114.
Abstract: Bag of visual words (BoVW) models are widely utilized in image/ video representation and recognition. The cornerstone of these models is the encoding stage, in which local features are decomposed over a codebook in order to obtain a representation of features. In this paper, we propose a new encoding algorithm by jointly encoding the set of local descriptors of each sample and considering the locality structure of descriptors. The proposed method takes advantages of locality coding such as its stability and robustness to noise in descriptors, as well as the strengths of the group coding strategy by taking into account the potential relation among descriptors of a sample. To efficiently implement our proposed method, we consider the Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. The method is employed for a challenging classification problem: action recognition by depth cameras. Experimental results demonstrate the outperformance of our methodology compared to the state-of-the-art on the considered datasets.
Keywords: Bag of words; Feature encoding; Locality constrained coding; Group sparse coding; Alternating direction method of multipliers; Action recognition
|
|
|
David Geronimo, David Vazquez, & Arturo de la Escalera. (2017). Vision-Based Advanced Driver Assistance Systems. In Computer Vision in Vehicle Technology: Land, Sea, and Air.
Keywords: ADAS; Autonomous Driving
|
|
|
Marçal Rusiñol, & Josep Llados. (2017). Flowchart Recognition in Patent Information Retrieval. In M. Lupu, K. Mayer, N. Kando, & A.J. Trippe (Eds.), Current Challenges in Patent Information Retrieval (Vol. 37, pp. 351–368). Springer Berlin Heidelberg.
|
|
|
Laura Lopez-Fuentes, Claudio Rossi, & Harald Skinnemoen. (2017). River segmentation for flood monitoring. In Data Science for Emergency Management at Big Data 2017.
Abstract: Floods are major natural disasters which cause deaths and material damages every year. Monitoring these events is crucial in order to reduce both the affected people and the economic losses. In this work we train and test three different Deep Learning segmentation algorithms to estimate the water area from river images, and compare their performances. We discuss the implementation of a novel data chain aimed to monitor river water levels by automatically process data collected from surveillance cameras, and to give alerts in case of high increases of the water level or flooding. We also create and openly publish the first image dataset for river water segmentation.
|
|