|
Maryam Asadi-Aghbolaghi, Albert Clapes, Marco Bellantonio, Hugo Jair Escalante, Victor Ponce, Xavier Baro, et al. (2017). A survey on deep learning based approaches for action and gesture recognition in image sequences. In 12th IEEE International Conference on Automatic Face and Gesture Recognition.
Abstract: The interest in action and gesture recognition has grown considerably in the last years. In this paper, we present a survey on current deep learning methodologies for action and gesture recognition in image sequences. We introduce a taxonomy that summarizes important aspects of deep learning
for approaching both tasks. We review the details of the proposed architectures, fusion strategies, main datasets, and competitions.
We summarize and discuss the main works proposed so far with particular interest on how they treat the temporal dimension of data, discussing their main features and identify opportunities and challenges for future research.
|
|
|
Eirikur Agustsson, Radu Timofte, Sergio Escalera, Xavier Baro, Isabelle Guyon, & Rasmus Rothe. (2017). Apparent and real age estimation in still images with deep residual regressors on APPA-REAL database. In 12th IEEE International Conference on Automatic Face and Gesture Recognition.
Abstract: After decades of research, the real (biological) age estimation from a single face image reached maturity thanks to the availability of large public face databases and impressive accuracies achieved by recently proposed methods.
The estimation of “apparent age” is a related task concerning the age perceived by human observers. Significant advances have been also made in this new research direction with the recent Looking At People challenges. In this paper we make several contributions to age estimation research. (i) We introduce APPA-REAL, a large face image database with both real and apparent age annotations. (ii) We study the relationship between real and apparent age. (iii) We develop a residual age regression method to further improve the performance. (iv) We show that real age estimation can be successfully tackled as an apparent age estimation followed by an apparent to real age residual regression. (v) We graphically reveal the facial regions on which the CNN focuses in order to perform apparent and real age estimation tasks.
|
|
|
Laura Lopez-Fuentes, Sebastia Massanet, & Manuel Gonzalez-Hidalgo. (2017). Image vignetting reduction via a maximization of fuzzy entropy. In IEEE International Conference on Fuzzy Systems.
Abstract: In many computer vision applications, vignetting is an undesirable effect which must be removed in a pre-processing step. Recently, an algorithm for image vignetting correction has been presented by means of a minimization of log-intensity entropy. This method relies on an increase of the entropy of the image when it is affected with vignetting. In this paper, we propose a novel algorithm to reduce image vignetting via a maximization of the fuzzy entropy of the image. Fuzzy entropy quantifies the fuzziness degree of a fuzzy set and its value is also modified by the presence of vignetting. The experimental results show that this novel algorithm outperforms in most cases the algorithm based on the minimization of log-intensity entropy both from the qualitative and the quantitative point of view.
|
|
|
Pau Riba, Josep Llados, & Alicia Fornes. (2017). Error-tolerant coarse-to-fine matching model for hierarchical graphs. In Pasquale Foggia, Cheng-Lin Liu, & Mario Vento (Eds.), 11th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition (Vol. 10310, pp. 107–117). Springer International Publishing.
Abstract: Graph-based representations are effective tools to capture structural information from visual elements. However, retrieving a query graph from a large database of graphs implies a high computational complexity. Moreover, these representations are very sensitive to noise or small changes. In this work, a novel hierarchical graph representation is designed. Using graph clustering techniques adapted from graph-based social media analysis, we propose to generate a hierarchy able to deal with different levels of abstraction while keeping information about the topology. For the proposed representations, a coarse-to-fine matching method is defined. These approaches are validated using real scenarios such as classification of colour images and handwritten word spotting.
Keywords: Graph matching; Hierarchical graph; Graph-based representation; Coarse-to-fine matching
|
|
|
Hana Jarraya, Muhammad Muzzamil Luqman, & Jean-Yves Ramel. (2017). Improving Fuzzy Multilevel Graph Embedding Technique by Employing Topological Node Features: An Application to Graphics Recognition. In B. Lamiroy, & R Dueire Lins (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 9657). LNCS. Springer.
|
|
|
Hana Jarraya, Oriol Ramos Terrades, & Josep Llados. (2017). Learning structural loss parameters on graph embedding applied on symbolic graphs. In 12th IAPR International Workshop on Graphics Recognition.
Abstract: We propose an amelioration of proposed Graph Embedding (GEM) method in previous work that takes advantages of structural pattern representation and the structured distortion. it models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector, as new signature of AG in a lower dimensional vectorial space. We focus to adapt the structured learning algorithm via 1_slack formulation with a suitable risk function, called Graph Edit Distance (GED). It defines the dissimilarity of the ground truth and predicted graph labels. It determines by the error tolerant graph matching using bipartite graph matching algorithm. We apply Structured Support Vector Machines (SSVM) to process classification task. During our experiments, we got our results on the GREC dataset.
|
|
|
Adria Rico, & Alicia Fornes. (2017). Camera-based Optical Music Recognition using a Convolutional Neural Network. In 12th IAPR International Workshop on Graphics Recognition (pp. 27–28).
Abstract: Optical Music Recognition (OMR) consists in recognizing images of music scores. Contrary to expectation, the current OMR systems usually fail when recognizing images of scores captured by digital cameras and smartphones. In this work, we propose a camera-based OMR system based on Convolutional Neural Networks, showing promising preliminary results
Keywords: optical music recognition; document analysis; convolutional neural network; deep learning
|
|
|
Daniel Hernandez, Antonio Espinosa, David Vazquez, Antonio Lopez, & Juan Carlos Moure. (2017). Embedded Real-time Stixel Computation. In GPU Technology Conference.
Keywords: GPU; CUDA; Stixels; Autonomous Driving
|
|
|
Oriol Vicente, Alicia Fornes, & Ramon Valdes. (2017). La Xarxa d Humanitats Digitals de la UABCie: una estructura inteligente para la investigación y la transferencia en Humanidades. In 3rd Congreso Internacional de Humanidades Digitales Hispánicas. Sociedad Internacional (pp. 281–383).
|
|
|
Rosa Maria Ortiz, Debora Gil, Elisa Minchole, Marta Diez-Ferrer, & Noelia Cubero de Frutos. (2017). Classification of Confolcal Endomicroscopy Patterns for Diagnosis of Lung Cancer. In 18th World Conference on Lung Cancer.
Abstract: Confocal Laser Endomicroscopy (CLE) is an emerging imaging technique that allows the in-vivo acquisition of cell patterns of potentially malignant lesions. Such patterns could discriminate between inflammatory and neoplastic lesions and, thus, serve as a first in-vivo biopsy to discard cases that do not actually require a cell biopsy.
The goal of this work is to explore whether CLE images obtained during videobronchoscopy contain enough visual information to discriminate between benign and malign peripheral lesions for lung cancer diagnosis. To do so, we have performed a pilot comparative study with 12 patients (6 adenocarcinoma and 6 benign-inflammatory) using 2 different methods for CLE pattern analysis: visual analysis by 3 experts and a novel methodology that uses graph methods to find patterns in pre-trained feature spaces. Our preliminary results indicate that although visual analysis can only achieve a 60.2% of accuracy, the accuracy of the proposed unsupervised image pattern classification raises to 84.6%.
We conclude that CLE images visual information allow in-vivo detection of neoplastic lesions and graph structural analysis applied to deep-learning feature spaces can achieve competitive results.
|
|
|
Veronica Romero, Alicia Fornes, Enrique Vidal, & Joan Andreu Sanchez. (2017). Information Extraction in Handwritten Marriage Licenses Books Using the MGGI Methodology. In L.A. Alexandre, J.Salvador Sanchez, & Joao M. F. Rodriguez (Eds.), 8th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 10255, pp. 287–294). LNCS.
Abstract: Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demographic and genealogical research. For example, marriage license books have been used for centuries by ecclesiastical and secular institutions to register marriages. These books follow a simple structure of the text in the records with a evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. In previous works we studied the use of category-based language models and how a Grammatical Inference technique known as MGGI could improve the accuracy of these tasks. In this work we analyze the main causes of the semantic errors observed in previous results and apply a better implementation of the MGGI technique to solve these problems. Using the resulting language model, transcription and information extraction experiments have been carried out, and the results support our proposed approach.
Keywords: Handwritten Text Recognition; Information extraction; Language modeling; MGGI; Categories-based language model
|
|
|
Marc Bolaños, Alvaro Peris, Francisco Casacuberta, & Petia Radeva. (2017). VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering. In 8th Iberian Conference on Pattern Recognition and Image Analysis.
Abstract: In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consumption. We validate our method on the VQA challenge dataset and compare it to the top performing methods in order to illustrate its performance and speed.
Keywords: Visual Qestion Aswering; Convolutional Neural Networks; Long short-term memory networks
|
|
|
Hana Jarraya, Oriol Ramos Terrades, & Josep Llados. (2017). Graph Embedding through Probabilistic Graphical Model applied to Symbolic Graphs. In 8th Iberian Conference on Pattern Recognition and Image Analysis.
Abstract: We propose a new Graph Embedding (GEM) method that takes advantages of structural pattern representation. It models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector. This vector is a signature of AG in a lower dimensional vectorial space. We apply Structured Support Vector Machines (SSVM) to process classification task. As first tentative, results on the GREC dataset are encouraging enough to go further on this direction.
Keywords: Attributed Graph; Probabilistic Graphical Model; Graph Embedding; Structured Support Vector Machines
|
|
|
Marc Masana, Joost Van de Weijer, Luis Herranz, Andrew Bagdanov, & Jose Manuel Alvarez. (2017). Domain-adaptive deep network compression. In 17th IEEE International Conference on Computer Vision.
Abstract: Deep Neural Networks trained on large datasets can be easily transferred to new domains with far fewer labeled examples by a process called fine-tuning. This has the advantage that representations learned in the large source domain can be exploited on smaller target domains. However, networks designed to be optimal for the source task are often prohibitively large for the target task. In this work we address the compression of networks after domain transfer.
We focus on compression algorithms based on low-rank matrix decomposition. Existing methods base compression solely on learned network weights and ignore the statistics of network activations. We show that domain transfer leads to large shifts in network activations and that it is desirable to take this into account when compressing.
We demonstrate that considering activation statistics when compressing weights leads to a rank-constrained regression problem with a closed-form solution. Because our method takes into account the target domain, it can more optimally
remove the redundancy in the weights. Experiments show that our Domain Adaptive Low Rank (DALR) method significantly outperforms existing low-rank compression techniques. With our approach, the fc6 layer of VGG19 can be compressed more than 4x more than using truncated SVD alone – with only a minor or no loss in accuracy. When applied to domain-transferred networks it allows for compression down to only 5-20% of the original number of parameters with only a minor drop in performance.
|
|
|
Xialei Liu, Joost Van de Weijer, & Andrew Bagdanov. (2017). RankIQA: Learning from Rankings for No-reference Image Quality Assessment. In 17th IEEE International Conference on Computer Vision.
Abstract: We propose a no-reference image quality assessment (NR-IQA) approach that learns from rankings (RankIQA). To address the problem of limited IQA dataset size, we train a Siamese Network to rank images in terms of image quality by using synthetically generated distortions for which relative image quality is known. These ranked image sets can be automatically generated without laborious human labeling. We then use fine-tuning to transfer the knowledge represented in the trained Siamese Network to a traditional CNN that estimates absolute image quality from single images. We demonstrate how our approach can be made significantly more efficient than traditional Siamese Networks by forward propagating a batch of images through a single network and backpropagating gradients derived from all pairs of images in the batch. Experiments on the TID2013 benchmark show that we improve the state-of-the-art by over 5%. Furthermore, on the LIVE benchmark we show that our approach is superior to existing NR-IQA techniques and that we even outperform the state-of-the-art in full-reference IQA (FR-IQA) methods without having to resort to high-quality reference images to infer IQA.
|
|