Daniel Hernandez, Antonio Espinosa, David Vazquez, Antonio Lopez, & Juan Carlos Moure. (2017). GPU-accelerated real-time stixel computation. In IEEE Winter Conference on Applications of Computer Vision (pp. 1054–1062).
Abstract: The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. The goal of this work is to implement and evaluate a complete multi-stixel estimation pipeline on an embedded, energyefficient, GPU-accelerated device. This work presents a full GPU-accelerated implementation of stixel estimation that produces reliable results at 26 frames per second (real-time) on the Tegra X1 for disparity images of 1024×440 pixels and stixel widths of 5 pixels, and achieves more than 400 frames per second on a high-end Titan X GPU card.
Keywords: Autonomous Driving; GPU; Stixel
|
Juan Ignacio Toledo, Sounak Dey, Alicia Fornes, & Josep Llados. (2017). Handwriting Recognition by Attribute embedding and Recurrent Neural Networks. In 14th International Conference on Document Analysis and Recognition (pp. 1038–1043).
Abstract: Handwriting recognition consists in obtaining the transcription of a text image. Recent word spotting methods based on attribute embedding have shown good performance when recognizing words. However, they are holistic methods in the sense that they recognize the word as a whole (i.e. they find the closest word in the lexicon to the word image). Consequently,
these kinds of approaches are not able to deal with out of vocabulary words, which are common in historical manuscripts. Also, they cannot be extended to recognize text lines. In order to address these issues, in this paper we propose a handwriting recognition method that adapts the attribute embedding to sequence learning. Concretely, the method learns the attribute embedding of patches of word images with a convolutional neural network. Then, these embeddings are presented as a sequence to a recurrent neural network that produces the transcription. We obtain promising results even without the use of any kind of dictionary or language model
|
Hugo Jair Escalante, Victor Ponce, Sergio Escalera, Xavier Baro, Alicia Morales-Reyes, & Jose Martinez-Carranza. (2017). Evolving weighting schemes for the Bag of Visual Words. Neural Computing and Applications - Neural Computing and Applications, 28(5), 925–939.
Abstract: The Bag of Visual Words (BoVW) is an established representation in computer vision. Taking inspiration from text mining, this representation has proved
to be very effective in many domains. However, in most cases, standard term-weighting schemes are adopted (e.g.,term-frequency or TF-IDF). It remains open the question of whether alternative weighting schemes could boost the
performance of methods based on BoVW. More importantly, it is unknown whether it is possible to automatically learn and determine effective weighting schemes from
scratch. This paper brings some light into both of these unknowns. On the one hand, we report an evaluation of the most common weighting schemes used in text mining, but rarely used in computer vision tasks. Besides, we propose an evolutionary algorithm capable of automatically learning weighting schemes for computer vision problems. We report empirical results of an extensive study in several computer vision problems. Results show the usefulness of the proposed method.
Keywords: Bag of Visual Words; Bag of features; Genetic programming; Term-weighting schemes; Computer vision
|
Cristhian A. Aguilera-Carrasco, Angel Sappa, Cristhian Aguilera, & Ricardo Toledo. (2017). Cross-Spectral Local Descriptors via Quadruplet Network. SENS - Sensors, 17(4), 873.
Abstract: This paper presents a novel CNN-based architecture, referred to as Q-Net, to learn local feature descriptors that are useful for matching image patches from two different spectral bands. Given correctly matched and non-matching cross-spectral image pairs, a quadruplet network is trained to map input image patches to a common Euclidean space, regardless of the input spectral band. Our approach is inspired by the recent success of triplet networks in the visible spectrum, but adapted for cross-spectral scenarios, where, for each matching pair, there are always two possible non-matching patches: one for each spectrum. Experimental evaluations on a public cross-spectral VIS-NIR dataset shows that the proposed approach improves the state-of-the-art. Moreover, the proposed technique can also be used in mono-spectral settings, obtaining a similar performance to triplet network descriptors, but requiring less training data.
|
Jose Garcia-Rodriguez, Isabelle Guyon, Sergio Escalera, Alexandra Psarrou, Andrew Lewis, & Miguel Cazorla. (2017). Editorial: Special Issue on Computational Intelligence for Vision and Robotics. Neural Computing and Applications - Neural Computing and Applications, 28(5), 853–854.
|
Ivet Rafegas, Javier Vazquez, Robert Benavente, Maria Vanrell, & Susana Alvarez. (2017). Enhancing spatio-chromatic representation with more-than-three color coding for image description. JOSA A - Journal of the Optical Society of America A, 34(5), 827–837.
Abstract: Extraction of spatio-chromatic features from color images is usually performed independently on each color channel. Usual 3D color spaces, such as RGB, present a high inter-channel correlation for natural images. This correlation can be reduced using color-opponent representations, but the spatial structure of regions with small color differences is not fully captured in two generic Red-Green and Blue-Yellow channels. To overcome these problems, we propose a new color coding that is adapted to the specific content of each image. Our proposal is based on two steps: (a) setting the number of channels to the number of distinctive colors we find in each image (avoiding the problem of channel correlation), and (b) building a channel representation that maximizes contrast differences within each color channel (avoiding the problem of low local contrast). We call this approach more-than-three color coding (MTT) to enhance the fact that the number of channels is adapted to the image content. The higher color complexity an image has, the more channels can be used to represent it. Here we select distinctive colors as the most predominant in the image, which we call color pivots, and we build the new color coding using these color pivots as a basis. To evaluate the proposed approach we measure its efficiency in an image categorization task. We show how a generic descriptor improves its performance at the description level when applied on the MTT coding.
|
C. Alejandro Parraga. (2017). Colours and Colour Vision: An Introductory Survey. PER - Perception, 46(5), 640–641.
|
Marta Diez-Ferrer, Debora Gil, Elena Carreño, Susana Padrones, & Samantha Aso. (2017). Positive Airway Pressure-Enhanced CT to Improve Virtual Bronchoscopic Navigation. JTO - Journal of Thoracic Oncology, 12(1S), S596–S597.
Abstract: A main weakness of virtual bronchoscopic navigation (VBN) is unsuccessful segmentation of distal branches approaching peripheral pulmonary nodules (PPN). CT scan acquisition protocol is pivotal for segmentation covering the utmost periphery. We hypothesize that application of continuous positive airway pressure (CPAP) during CT acquisition could improve visualization and segmentation of peripheral bronchi. The purpose of the present pilot study is to compare quality of segmentations under 4 CT acquisition modes: inspiration (INSP), expiration (EXP) and both with CPAP (INSP-CPAP and EXP-CPAP).
Keywords: Thorax CT; diagnosis; Peripheral Pulmonary Nodule
|
Pau Rodriguez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca, & Jordi Gonzalez. (2017). Age and gender recognition in the wild with deep attention. PR - Pattern Recognition, 72, 563–571.
Abstract: Face analysis in images in the wild still pose a challenge for automatic age and gender recognition tasks, mainly due to their high variability in resolution, deformation, and occlusion. Although the performance has highly increased thanks to Convolutional Neural Networks (CNNs), it is still far from optimal when compared to other image recognition tasks, mainly because of the high sensitiveness of CNNs to facial variations. In this paper, inspired by biology and the recent success of attention mechanisms on visual question answering and fine-grained recognition, we propose a novel feedforward attention mechanism that is able to discover the most informative and reliable parts of a given face for improving age and gender classification. In particular, given a downsampled facial image, the proposed model is trained based on a novel end-to-end learning framework to extract the most discriminative patches from the original high-resolution image. Experimental validation on the standard Adience, Images of Groups, and MORPH II benchmarks show that including attention mechanisms enhances the performance of CNNs in terms of robustness and accuracy.
Keywords: Age recognition; Gender recognition; Deep neural networks; Attention mechanisms
|
Maryam Asadi-Aghbolaghi, Albert Clapes, Marco Bellantonio, Hugo Jair Escalante, Victor Ponce, Xavier Baro, et al. (2017). Deep Learning for Action and Gesture Recognition in Image Sequences: A Survey. In Gesture Recognition (pp. 539–578).
Abstract: Interest in automatic action and gesture recognition has grown considerably in the last few years. This is due in part to the large number of application domains for this type of technology. As in many other computer vision areas, deep learning based methods have quickly become a reference methodology for obtaining state-of-the-art performance in both tasks. This chapter is a survey of current deep learning based methodologies for action and gesture recognition in sequences of images. The survey reviews both fundamental and cutting edge methodologies reported in the last few years. We introduce a taxonomy that summarizes important aspects of deep learning for approaching both tasks. Details of the proposed architectures, fusion strategies, main datasets, and competitions are reviewed. Also, we summarize and discuss the main works proposed so far with particular interest on how they treat the temporal dimension of data, their highlighting features, and opportunities and challenges for future research. To the best of our knowledge this is the first survey in the topic. We foresee this survey will become a reference in this ever dynamic field of research.
Keywords: Action recognition; Gesture recognition; Deep learning architectures; Fusion strategies
|
Jordi Esquirol, Cristina Palmero, Vanessa Bayo, Miquel Angel Cos, Sergio Escalera, David Sanchez, et al. (2017). Automatic RBG-depth-pressure anthropometric analysis and individualised sleep solution prescription. JMET - Journal of Medical Engineering & Technology, 486–497.
Abstract: INTRODUCTION:
Sleep surfaces must adapt to individual somatotypic features to maintain a comfortable, convenient and healthy sleep, preventing diseases and injuries. Individually determining the most adequate rest surface can often be a complex and subjective question.
OBJECTIVES:
To design and validate an automatic multimodal somatotype determination model to automatically recommend an individually designed mattress-topper-pillow combination.
METHODS:
Design and validation of an automated prescription model for an individualised sleep system is performed through a single-image 2 D-3 D analysis and body pressure distribution, to objectively determine optimal individual sleep surfaces combining five different mattress densities, three different toppers and three cervical pillows.
RESULTS:
A final study (n = 151) and re-analysis (n = 117) defined and validated the model, showing high correlations between calculated and real data (>85% in height and body circumferences, 89.9% in weight, 80.4% in body mass index and more than 70% in morphotype categorisation).
CONCLUSIONS:
Somatotype determination model can accurately prescribe an individualised sleep solution. This can be useful for healthy people and for health centres that need to adapt sleep surfaces to people with special needs. Next steps will increase model's accuracy and analise, if this prescribed individualised sleep solution can improve sleep quantity and quality; additionally, future studies will adapt the model to mattresses with technological improvements, tailor-made production and will define interfaces for people with special needs.
|
Pau Riba, Anjan Dutta, Josep Llados, Alicia Fornes, & Sounak Dey. (2017). Improving Information Retrieval in Multiwriter Scenario by Exploiting the Similarity Graph of Document Terms. In 14th International Conference on Document Analysis and Recognition (pp. 475–480).
Abstract: Information Retrieval (IR) is the activity of obtaining information resources relevant to a questioned information. It usually retrieves a set of objects ranked according to the relevancy to the needed fact. In document analysis, information retrieval receives a lot of attention in terms of symbol and word spotting. However, through decades the community mostly focused either on printed or on single writer scenario, where the
state-of-the-art results have achieved reasonable performance on the available datasets. Nevertheless, the existing algorithms do not perform accordingly on multiwriter scenario. A graph representing relations between a set of objects is a structure where each node delineates an individual element and the similarity between them is represented as a weight on the connecting edge. In this paper, we explore different analytics of graphs constructed from words or graphical symbols, such as diffusion, shortest path, etc. to improve the performance of information retrieval methods in multiwriter scenario
Keywords: document terms; information retrieval; affinity graph; graph of document terms; multiwriter; graph diffusion
|
Alicia Fornes, Beata Megyesi, & Joan Mas. (2017). Transcription of Encoded Manuscripts with Image Processing Techniques. In Digital Humanities Conference (pp. 441–443).
|
Luis Herranz, Shuqiang Jiang, & Ruihan Xu. (2017). Modeling Restaurant Context for Food Recognition. TMM - IEEE Transactions on Multimedia, 19(2), 430–440.
Abstract: Food photos are widely used in food logs for diet monitoring and in social networks to share social and gastronomic experiences. A large number of these images are taken in restaurants. Dish recognition in general is very challenging, due to different cuisines, cooking styles, and the intrinsic difficulty of modeling food from its visual appearance. However, contextual knowledge can be crucial to improve recognition in such scenario. In particular, geocontext has been widely exploited for outdoor landmark recognition. Similarly, we exploit knowledge about menus and location of restaurants and test images. We first adapt a framework based on discarding unlikely categories located far from the test image. Then, we reformulate the problem using a probabilistic model connecting dishes, restaurants, and locations. We apply that model in three different tasks: dish recognition, restaurant recognition, and location refinement. Experiments on six datasets show that by integrating multiple evidences (visual, location, and external knowledge) our system can boost the performance in all tasks.
|
Debora Gil, Sergio Vera, Agnes Borras, Albert Andaluz, & Miguel Angel Gonzalez Ballester. (2017). Anatomical Medial Surfaces with Efficient Resolution of Branches Singularities. MIA - Medical Image Analysis, 35, 390–402.
Abstract: Medial surfaces are powerful tools for shape description, but their use has been limited due to the sensibility existing methods to branching artifacts. Medial branching artifacts are associated to perturbations of the object boundary rather than to geometric features. Such instability is a main obstacle for a condent application in shape recognition and description. Medial branches correspond to singularities of the medial surface and, thus, they are problematic for existing morphological and energy-based algorithms. In this paper, we use algebraic geometry concepts in an energy-based approach to compute a medial surface presenting a stable branching topology. We also present an ecient GPU-CPU implementation using standard image processing tools. We show the method computational eciency and quality on a custom made synthetic database. Finally, we present some results on a medical imaging application for localization of abdominal pathologies.
Keywords: Medial Representations; Shape Recognition; Medial Branching Stability ; Singular Points
|