Xavier Perez Sala, Fernando De la Torre, Laura Igual, Sergio Escalera, & Cecilio Angulo. (2017). Subspace Procrustes Analysis. IJCV - International Journal of Computer Vision, 121(3), 327–343.
Abstract: Procrustes Analysis (PA) has been a popular technique to align and build 2-D statistical models of shapes. Given a set of 2-D shapes PA is applied to remove rigid transformations. Then, a non-rigid 2-D model is computed by modeling (e.g., PCA) the residual. Although PA has been widely used, it has several limitations for modeling 2-D shapes: occluded landmarks and missing data can result in local minima solutions, and there is no guarantee that the 2-D shapes provide a uniform sampling of the 3-D space of rotations for the object. To address previous issues, this paper proposes Subspace PA (SPA). Given several
instances of a 3-D object, SPA computes the mean and a 2-D subspace that can simultaneously model all rigid and non-rigid deformations of the 3-D object. We propose a discrete (DSPA) and continuous (CSPA) formulation for SPA, assuming that 3-D samples of an object are provided. DSPA extends the traditional PA, and produces unbiased 2-D models by uniformly sampling different views of the 3-D object. CSPA provides a continuous approach to uniformly sample the space of 3-D rotations, being more efficient in space and time. Experiments using SPA to learn 2-D models of bodies from motion capture data illustrate the benefits of our approach.
|
Frederic Sampedro, Anna Domenech, Sergio Escalera, & Ignasi Carrio. (2017). Computing quantitative indicators of structural renal damage in pediatric DMSA scans. REMNIM - Revista Española de Medicina Nuclear e Imagen Molecular, 36(2), 72–77.
Abstract: OBJECTIVES:
The proposal and implementation of a computational framework for the quantification of structural renal damage from 99mTc-dimercaptosuccinic acid (DMSA) scans. The aim of this work is to propose, implement, and validate a computational framework for the quantification of structural renal damage from DMSA scans and in an observer-independent manner.
MATERIALS AND METHODS:
From a set of 16 pediatric DMSA-positive scans and 16 matched controls and using both expert-guided and automatic approaches, a set of image-derived quantitative indicators was computed based on the relative size, intensity and histogram distribution of the lesion. A correlation analysis was conducted in order to investigate the association of these indicators with other clinical data of interest in this scenario, including C-reactive protein (CRP), white cell count, vesicoureteral reflux, fever, relative perfusion, and the presence of renal sequelae in a 6-month follow-up DMSA scan.
RESULTS:
A fully automatic lesion detection and segmentation system was able to successfully classify DMSA-positive from negative scans (AUC=0.92, sensitivity=81% and specificity=94%). The image-computed relative size of the lesion correlated with the presence of fever and CRP levels (p<0.05), and a measurement derived from the distribution histogram of the lesion obtained significant performance results in the detection of permanent renal damage (AUC=0.86, sensitivity=100% and specificity=75%).
CONCLUSIONS:
The proposal and implementation of a computational framework for the quantification of structural renal damage from DMSA scans showed a promising potential to complement visual diagnosis and non-imaging indicators.
|
Jose Garcia-Rodriguez, Isabelle Guyon, Sergio Escalera, Alexandra Psarrou, Andrew Lewis, & Miguel Cazorla. (2017). Editorial: Special Issue on Computational Intelligence for Vision and Robotics. Neural Computing and Applications - Neural Computing and Applications, 28(5), 853–854.
|
Arash Akbarinia, & Karl R. Gegenfurtner. (2017). Metameric Mismatching in Natural and Artificial Reflectances. JV - Journal of Vision, 17(10), 390.
Abstract: The human visual system and most digital cameras sample the continuous spectral power distribution through three classes of receptors. This implies that two distinct spectral reflectances can result in identical tristimulus values under one illuminant and differ under another – the problem of metamer mismatching. It is still debated how frequent this issue arises in the real world, using naturally occurring reflectance functions and common illuminants.
We gathered more than ten thousand spectral reflectance samples from various sources, covering a wide range of environments (e.g., flowers, plants, Munsell chips) and evaluated their responses under a number of natural and artificial source of lights. For each pair of reflectance functions, we estimated the perceived difference using the CIE-defined distance ΔE2000 metric in Lab color space.
The degree of metamer mismatching depended on the lower threshold value l when two samples would be considered to lead to equal sensor excitations (ΔE < l), and on the higher threshold value h when they would be considered different. For example, for l=h=1, we found that 43.129 comparisons out of a total of 6×107 pairs would be considered metameric (1 in 104). For l=1 and h=5, this number reduced to 705 metameric pairs (2 in 106). Extreme metamers, for instance l=1 and h=10, were rare (22 pairs or 6 in 108), as were instances where the two members of a metameric pair would be assigned to different color categories. Not unexpectedly, we observed variations among different reflectance databases and illuminant spectra with more frequency under artificial illuminants than natural ones.
Overall, our numbers are not very different from those obtained earlier (Foster et al, JOSA A, 2006). However, our results also show that the degree of metamerism is typically not very strong and that category switches hardly ever occur.
Keywords: Metamer; colour perception; spectral discrimination; photoreceptors
|
Marta Diez-Ferrer, Debora Gil, Elena Carreño, Susana Padrones, & Samantha Aso. (2017). Positive Airway Pressure-Enhanced CT to Improve Virtual Bronchoscopic Navigation. JTO - Journal of Thoracic Oncology, 12(1S), S596–S597.
Abstract: A main weakness of virtual bronchoscopic navigation (VBN) is unsuccessful segmentation of distal branches approaching peripheral pulmonary nodules (PPN). CT scan acquisition protocol is pivotal for segmentation covering the utmost periphery. We hypothesize that application of continuous positive airway pressure (CPAP) during CT acquisition could improve visualization and segmentation of peripheral bronchi. The purpose of the present pilot study is to compare quality of segmentations under 4 CT acquisition modes: inspiration (INSP), expiration (EXP) and both with CPAP (INSP-CPAP and EXP-CPAP).
Keywords: Thorax CT; diagnosis; Peripheral Pulmonary Nodule
|
Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, & Yoshua Bengio. (2017). The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition Workshops.
Abstract: State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions.
Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train.
In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.
Keywords: Semantic Segmentation
|
Pau Riba, Josep Llados, & Alicia Fornes. (2017). Error-tolerant coarse-to-fine matching model for hierarchical graphs. In Pasquale Foggia, Cheng-Lin Liu, & Mario Vento (Eds.), 11th IAPR-TC-15 International Workshop on Graph-Based Representations in Pattern Recognition (Vol. 10310, pp. 107–117). Springer International Publishing.
Abstract: Graph-based representations are effective tools to capture structural information from visual elements. However, retrieving a query graph from a large database of graphs implies a high computational complexity. Moreover, these representations are very sensitive to noise or small changes. In this work, a novel hierarchical graph representation is designed. Using graph clustering techniques adapted from graph-based social media analysis, we propose to generate a hierarchy able to deal with different levels of abstraction while keeping information about the topology. For the proposed representations, a coarse-to-fine matching method is defined. These approaches are validated using real scenarios such as classification of colour images and handwritten word spotting.
Keywords: Graph matching; Hierarchical graph; Graph-based representation; Coarse-to-fine matching
|
Veronica Romero, Alicia Fornes, Enrique Vidal, & Joan Andreu Sanchez. (2017). Information Extraction in Handwritten Marriage Licenses Books Using the MGGI Methodology. In L.A. Alexandre, J.Salvador Sanchez, & Joao M. F. Rodriguez (Eds.), 8th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 10255, pp. 287–294). LNCS.
Abstract: Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demographic and genealogical research. For example, marriage license books have been used for centuries by ecclesiastical and secular institutions to register marriages. These books follow a simple structure of the text in the records with a evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. In previous works we studied the use of category-based language models and how a Grammatical Inference technique known as MGGI could improve the accuracy of these tasks. In this work we analyze the main causes of the semantic errors observed in previous results and apply a better implementation of the MGGI technique to solve these problems. Using the resulting language model, transcription and information extraction experiments have been carried out, and the results support our proposed approach.
Keywords: Handwritten Text Recognition; Information extraction; Language modeling; MGGI; Categories-based language model
|
Antonio Lopez, Jiaolong Xu, Jose Luis Gomez, David Vazquez, & German Ros. (2017). From Virtual to Real World Visual Perception using Domain Adaptation -- The DPM as Example. In Gabriela Csurka (Ed.), Domain Adaptation in Computer Vision Applications (pp. 243–258). Springer.
Abstract: Supervised learning tends to produce more accurate classifiers than unsupervised learning in general. This implies that training data is preferred with annotations. When addressing visual perception challenges, such as localizing certain object classes within an image, the learning of the involved classifiers turns out to be a practical bottleneck. The reason is that, at least, we have to frame object examples with bounding boxes in thousands of images. A priori, the more complex the model is regarding its number of parameters, the more annotated examples are required. This annotation task is performed by human oracles, which ends up in inaccuracies and errors in the annotations (aka ground truth) since the task is inherently very cumbersome and sometimes ambiguous. As an alternative we have pioneered the use of virtual worlds for collecting such annotations automatically and with high precision. However, since the models learned with virtual data must operate in the real world, we still need to perform domain adaptation (DA). In this chapter we revisit the DA of a deformable part-based model (DPM) as an exemplifying case of virtual- to-real-world DA. As a use case, we address the challenge of vehicle detection for driver assistance, using different publicly available virtual-world data. While doing so, we investigate questions such as: how does the domain gap behave due to virtual-vs-real data with respect to dominant object appearance per domain, as well as the role of photo-realism in the virtual world.
Keywords: Domain Adaptation
|
Pau Riba, Josep Llados, Alicia Fornes, & Anjan Dutta. (2017). Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases. PRL - Pattern Recognition Letters, 87, 203–211.
Abstract: Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations. However, retrieving a query graph from a large dataset of graphs implies a high computational complexity. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. With this aim, in this paper we propose a graph indexation formalism applied to visual retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Then, each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in different real scenarios such as handwritten word spotting in images of historical documents or symbol spotting in architectural floor plans.
|
Daniel Hernandez, Antonio Espinosa, David Vazquez, Antonio Lopez, & Juan Carlos Moure. (2017). Embedded Real-time Stixel Computation. In GPU Technology Conference.
Keywords: GPU; CUDA; Stixels; Autonomous Driving
|
David Vazquez, Jorge Bernal, F. Javier Sanchez, Gloria Fernandez Esparrach, Antonio Lopez, Adriana Romero, et al. (2017). A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images. In 31st International Congress and Exhibition on Computer Assisted Radiology and Surgery.
Abstract: Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss-rate and inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing Decision Support Systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. We provide new baselines on this dataset by training standard fully convolutional networks (FCN) for semantic segmentation and significantly outperforming, without any further post-processing, prior results in endoluminal scene segmentation.
Keywords: Deep Learning; Medical Imaging
|
David Geronimo, David Vazquez, & Arturo de la Escalera. (2017). Vision-Based Advanced Driver Assistance Systems. In Computer Vision in Vehicle Technology: Land, Sea, and Air.
Keywords: ADAS; Autonomous Driving
|
German Ros, Laura Sellart, Gabriel Villalonga, Elias Maidanik, Francisco Molero, Marc Garcia, et al. (2017). Semantic Segmentation of Urban Scenes via Domain Adaptation of SYNTHIA. In Gabriela Csurka (Ed.), Domain Adaptation in Computer Vision Applications (Vol. 12, pp. 227–241). Springer.
Abstract: Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. Recent revolutionary results of deep convolutional neural networks (DCNNs) foreshadow the advent of reliable classifiers to perform such visual tasks. However, DCNNs require learning of many parameters from raw images; thus, having a sufficient amount of diverse images with class annotations is needed. These annotations are obtained via cumbersome, human labour which is particularly challenging for semantic segmentation since pixel-level annotations are required. In this chapter, we propose to use a combination of a virtual world to automatically generate realistic synthetic images with pixel-level annotations, and domain adaptation to transfer the models learnt to correctly operate in real scenarios. We address the question of how useful synthetic data can be for semantic segmentation – in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations and object identifiers. We use SYNTHIA in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments with DCNNs that show that combining SYNTHIA with simple domain adaptation techniques in the training stage significantly improves performance on semantic segmentation.
Keywords: SYNTHIA; Virtual worlds; Autonomous Driving
|
Lluis Gomez, & Dimosthenis Karatzas. (2017). TextProposals: a Text‐specific Selective Search Algorithm for Word Spotting in the Wild. PR - Pattern Recognition, 70, 60–74.
Abstract: Motivated by the success of powerful while expensive techniques to recognize words in a holistic way (Goel et al., 2013; Almazán et al., 2014; Jaderberg et al., 2016) object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way.
Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers (Almazán et al., 2014; Jaderberg et al., 2016) shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10% F-score the best-performing method in the last ICDAR Robust Reading Competition (Karatzas, 2015). Source code of the complete end-to-end system is available at https://github.com/lluisgomez/TextProposals.
|