|
Arash Akbarinia, C. Alejandro Parraga, Marta Exposito, Bogdan Raducanu, & Xavier Otazu. (2017). Can biological solutions help computers detect symmetry? In 40th European Conference on Visual Perception.
|
|
|
J. Chazalon, P. Gomez-Kramer, Jean-Christophe Burie, M.Coustaty, S.Eskenazi, Muhammad Muzzamil Luqman, et al. (2017). SmartDoc 2017 Video Capture: Mobile Document Acquisition in Video Mode. In 1st International Workshop on Open Services and Tools for Document Analysis.
Abstract: As mobile document acquisition using smartphones is getting more and more common, along with the continuous improvement of mobile devices (both in terms of computing power and image quality), we can wonder to which extent mobile phones can replace desktop scanners. Modern applications can cope with perspective distortion and normalize the contrast of a document page captured with a smartphone, and in some cases like bottle labels or posters, smartphones even have the advantage of allowing the acquisition of non-flat or large documents. However, several cases remain hard to handle, such as reflective documents (identity cards, badges, glossy magazine cover, etc.) or large documents for which some regions require an important amount of detail. This paper introduces the SmartDoc 2017 benchmark (named “SmartDoc Video Capture”), which aims at
assessing whether capturing documents using the video mode of a smartphone could solve those issues. The task under evaluation is both a stitching and a reconstruction problem, as the user can move the device over different parts of the document to capture details or try to erase highlights. The material released consists of a dataset, an evaluation method and the associated tool, a sample method, and the tools required to extend the dataset. All the components are released publicly under very permissive licenses, and we particularly cared about maximizing the ease of
understanding, usage and improvement.
|
|
|
Lluis Gomez, Marçal Rusiñol, & Dimosthenis Karatzas. (2017). LSDE: Levenshtein Space Deep Embedding for Query-by-string Word Spotting. In 14th International Conference on Document Analysis and Recognition.
Abstract: n this paper we present the LSDE string representation and its application to handwritten word spotting. LSDE is a novel embedding approach for representing strings that learns a space in which distances between projected points are correlated with the Levenshtein edit distance between the original strings.
We show how such a representation produces a more semantically interpretable retrieval from the user’s perspective than other state of the art ones such as PHOC and DCToW. We also conduct a preliminary handwritten word spotting experiment on the George Washington dataset.
|
|
|
E. Royer, J. Chazalon, Marçal Rusiñol, & F. Bouchara. (2017). Benchmarking Keypoint Filtering Approaches for Document Image Matching. In 14th International Conference on Document Analysis and Recognition.
Abstract: Best Poster Award.
Reducing the amount of keypoints used to index an image is particularly interesting to control processing time and memory usage in real-time document image matching applications, like augmented documents or smartphone applications. This paper benchmarks two keypoint selection methods on a task consisting of reducing keypoint sets extracted from document images, while preserving detection and segmentation accuracy. We first study the different forms of keypoint filtering, and we introduce the use of the CORE selection method on
keypoints extracted from document images. Then, we extend a previously published benchmark by including evaluations of the new method, by adding the SURF-BRISK detection/description scheme, and by reporting processing speeds. Evaluations are conducted on the publicly available dataset of ICDAR2015 SmartDOC challenge 1. Finally, we prove that reducing the original keypoint set is always feasible and can be beneficial
not only to processing speed but also to accuracy.
|
|
|
David Aldavert, Marçal Rusiñol, & Ricardo Toledo. (2017). Automatic Static/Variable Content Separation in Administrative Document Images. In 14th International Conference on Document Analysis and Recognition.
Abstract: In this paper we present an automatic method for separating static and variable content from administrative document images. An alignment approach is able to unsupervisedly build probabilistic templates from a set of examples of the same document kind. Such templates define which is the likelihood of every pixel of being either static or variable content. In the extraction step, the same alignment technique is used to match
an incoming image with the template and to locate the positions where variable fields appear. We validate our approach on the public NIST Structured Tax Forms Dataset.
|
|
|
Katerine Diaz, Konstantia Georgouli, Anastasios Koidis, & Jesus Martinez del Rincon. (2017). Incremental model learning for spectroscopy-based food analysis. CILS - Chemometrics and Intelligent Laboratory Systems, 167, 123–131.
Abstract: In this paper we propose the use of incremental learning for creating and improving multivariate analysis models in the field of chemometrics of spectral data. As main advantages, our proposed incremental subspace-based learning allows creating models faster, progressively improving previously created models and sharing them between laboratories and institutions without requiring transferring or disclosing individual spectra samples. In particular, our approach allows to improve the generalization and adaptability of previously generated models with a few new spectral samples to be applicable to real-world situations. The potential of our approach is demonstrated using vegetable oil type identification based on spectroscopic data as case study. Results show how incremental models maintain the accuracy of batch learning methodologies while reducing their computational cost and handicaps.
Keywords: Incremental model learning; IGDCV technique; Subspace based learning; IdentificationVegetable oils; FT-IR spectroscopy
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, & Aura Hernandez-Sabate. (2017). Decremental generalized discriminative common vectors applied to images classification. KBS - Knowledge-Based Systems, 131, 46–57.
Abstract: In this paper, a novel decremental subspace-based learning method called Decremental Generalized Discriminative Common Vectors method (DGDCV) is presented. The method makes use of the concept of decremental learning, which we introduce in the field of supervised feature extraction and classification. By efficiently removing unnecessary data and/or classes for a knowledge base, our methodology is able to update the model without recalculating the full projection or accessing to the previously processed training data, while retaining the previously acquired knowledge. The proposed method has been validated in 6 standard face recognition datasets, showing a considerable computational gain without compromising the accuracy of the model.
Keywords: Decremental learning; Generalized Discriminative Common Vectors; Feature extraction; Linear subspace methods; Classification
|
|
|
Leonardo Galteri, Dena Bazazian, Lorenzo Seidenari, Marco Bertini, Andrew Bagdanov, Anguelos Nicolaou, et al. (2017). Reading Text in the Wild from Compressed Images. In 1st International workshop on Egocentric Perception, Interaction and Computing.
Abstract: Reading text in the wild is gaining attention in the computer vision community. Images captured in the wild are almost always compressed to varying degrees, depending on application context, and this compression introduces artifacts
that distort image content into the captured images. In this paper we investigate the impact these compression artifacts have on text localization and recognition in the wild. We also propose a deep Convolutional Neural Network (CNN) that can eliminate text-specific compression artifacts and which leads to an improvement in text recognition. Experimental results on the ICDAR-Challenge4 dataset demonstrate that compression artifacts have a significant
impact on text localization and recognition and that our approach yields an improvement in both – especially at high compression rates.
|
|
|
Andrei Polzounov, Artsiom Ablavatski, Sergio Escalera, Shijian Lu, & Jianfei Cai. (2017). WordFences: Text Localization and Recognition. In 24th International Conference on Image Processing.
|
|
|
Sergio Escalera, Vassilis Athitsos, & Isabelle Guyon. (2017). Challenges in Multi-modal Gesture Recognition. (pp. 1–60).
Abstract: This paper surveys the state of the art on multimodal gesture recognition and introduces the JMLR special topic on gesture recognition 2011–2015. We began right at the start of the Kinect TMTM revolution when inexpensive infrared cameras providing image depth recordings became available. We published papers using this technology and other more conventional methods, including regular video cameras, to record data, thus providing a good overview of uses of machine learning and computer vision using multimodal data in this area of application. Notably, we organized a series of challenges and made available several datasets we recorded for that purpose, including tens of thousands of videos, which are available to conduct further research. We also overview recent state of the art works on gesture recognition based on a proposed taxonomy for gesture recognition, discussing challenges and future lines of research.
Keywords: Gesture recognition; Time series analysis; Multimodal data analysis; Computer vision; Pattern recognition; Wearable sensors; Infrared cameras; Kinect TMTM
|
|
|
Jordi Esquirol, Cristina Palmero, Vanessa Bayo, Miquel Angel Cos, Sergio Escalera, David Sanchez, et al. (2017). Automatic RBG-depth-pressure anthropometric analysis and individualised sleep solution prescription. JMET - Journal of Medical Engineering & Technology, 486–497.
Abstract: INTRODUCTION:
Sleep surfaces must adapt to individual somatotypic features to maintain a comfortable, convenient and healthy sleep, preventing diseases and injuries. Individually determining the most adequate rest surface can often be a complex and subjective question.
OBJECTIVES:
To design and validate an automatic multimodal somatotype determination model to automatically recommend an individually designed mattress-topper-pillow combination.
METHODS:
Design and validation of an automated prescription model for an individualised sleep system is performed through a single-image 2 D-3 D analysis and body pressure distribution, to objectively determine optimal individual sleep surfaces combining five different mattress densities, three different toppers and three cervical pillows.
RESULTS:
A final study (n = 151) and re-analysis (n = 117) defined and validated the model, showing high correlations between calculated and real data (>85% in height and body circumferences, 89.9% in weight, 80.4% in body mass index and more than 70% in morphotype categorisation).
CONCLUSIONS:
Somatotype determination model can accurately prescribe an individualised sleep solution. This can be useful for healthy people and for health centres that need to adapt sleep surfaces to people with special needs. Next steps will increase model's accuracy and analise, if this prescribed individualised sleep solution can improve sleep quantity and quality; additionally, future studies will adapt the model to mattresses with technological improvements, tailor-made production and will define interfaces for people with special needs.
|
|
|
Sergio Escalera, Xavier Baro, Hugo Jair Escalante, & Isabelle Guyon. (2017). ChaLearn Looking at People: A Review of Events and Resources. In 30th International Joint Conference on Neural Networks.
Abstract: This paper reviews the historic of ChaLearn Looking at People (LAP) events. We started in 2011 (with the release of the first Kinect device) to run challenges related to human action/activity and gesture recognition. Since then we have regularly organized events in a series of competitions covering all aspects of visual analysis of humans. So far we have organized more than 10 international challenges and events in this field. This paper reviews associated events, and introduces the ChaLearn LAP platform where public resources (including code, data and preprints of papers) related to the organized events are available. We also provide a discussion on perspectives of ChaLearn LAP activities.
|
|
|
Eirikur Agustsson, Radu Timofte, Sergio Escalera, Xavier Baro, Isabelle Guyon, & Rasmus Rothe. (2017). Apparent and real age estimation in still images with deep residual regressors on APPA-REAL database. In 12th IEEE International Conference on Automatic Face and Gesture Recognition.
Abstract: After decades of research, the real (biological) age estimation from a single face image reached maturity thanks to the availability of large public face databases and impressive accuracies achieved by recently proposed methods.
The estimation of “apparent age” is a related task concerning the age perceived by human observers. Significant advances have been also made in this new research direction with the recent Looking At People challenges. In this paper we make several contributions to age estimation research. (i) We introduce APPA-REAL, a large face image database with both real and apparent age annotations. (ii) We study the relationship between real and apparent age. (iii) We develop a residual age regression method to further improve the performance. (iv) We show that real age estimation can be successfully tackled as an apparent age estimation followed by an apparent to real age residual regression. (v) We graphically reveal the facial regions on which the CNN focuses in order to perform apparent and real age estimation tasks.
|
|
|
Mohammad Ali Bagheri, Qigang Gao, Sergio Escalera, Huamin Ren, Thomas B. Moeslund, & Elham Etemad. (2017). Locality Regularized Group Sparse Coding for Action Recognition. CVIU - Computer Vision and Image Understanding, 158, 106–114.
Abstract: Bag of visual words (BoVW) models are widely utilized in image/ video representation and recognition. The cornerstone of these models is the encoding stage, in which local features are decomposed over a codebook in order to obtain a representation of features. In this paper, we propose a new encoding algorithm by jointly encoding the set of local descriptors of each sample and considering the locality structure of descriptors. The proposed method takes advantages of locality coding such as its stability and robustness to noise in descriptors, as well as the strengths of the group coding strategy by taking into account the potential relation among descriptors of a sample. To efficiently implement our proposed method, we consider the Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. The method is employed for a challenging classification problem: action recognition by depth cameras. Experimental results demonstrate the outperformance of our methodology compared to the state-of-the-art on the considered datasets.
Keywords: Bag of words; Feature encoding; Locality constrained coding; Group sparse coding; Alternating direction method of multipliers; Action recognition
|
|
|
Patricia Suarez, Angel Sappa, & Boris X. Vintimilla. (2017). Colorizing Infrared Images through a Triplet Conditional DCGAN Architecture. In 19th international conference on image analysis and processing.
Abstract: This paper focuses on near infrared (NIR) image colorization by using a Conditional Deep Convolutional Generative Adversarial Network (CDCGAN) architecture model. The proposed architecture is based on the usage of a conditional probabilistic generative model. Firstly, it learns to colorize the given input image, by using a triplet model architecture that tackle every channel in an independent way. In the proposed model, the nal layer of red channel consider the infrared image to enhance the details, resulting in a sharp RGB image. Then, in the second stage, a discriminative model is used to estimate the probability that the generated image came from the training dataset, rather than the image automatically generated. Experimental results with a large set of real images are provided showing the validity of the proposed approach. Additionally, the proposed approach is compared with a state of the art approach showing better results.
Keywords: CNN in Multispectral Imaging; Image Colorization
|
|