|
Maedeh Aghaei, Mariella Dimiccoli, & Petia Radeva. (2016). Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams. CVIU - Computer Vision and Image Understanding, 149, 146–156.
Abstract: Wearable cameras offer a hands-free way to record egocentric images of daily experiences, where social events are of special interest. The first step towards detection of social events is to track the appearance of multiple persons involved in them. In this paper, we propose a novel method to find correspondences of multiple faces in low temporal resolution egocentric videos acquired through a wearable camera. This kind of photo-stream imposes additional challenges to the multi-tracking problem with respect to conventional videos. Due to the free motion of the camera and to its low temporal resolution, abrupt changes in the field of view, in illumination condition and in the target location are highly frequent. To overcome such difficulties, we propose a multi-face tracking method that generates a set of tracklets through finding correspondences along the whole sequence for each detected face and takes advantage of the tracklets redundancy to deal with unreliable ones. Similar tracklets are grouped into the so called extended bag-of-tracklets (eBoT), which is aimed to correspond to a specific person. Finally, a prototype tracklet is extracted for each eBoT, where the occurred occlusions are estimated by relying on a new measure of confidence. We validated our approach over an extensive dataset of egocentric photo-streams and compared it to state of the art methods, demonstrating its effectiveness and robustness.
|
|
|
Onur Ferhat, & Fernando Vilariño. (2016). Low Cost Eye Tracking: The Current Panorama. CIN - Computational Intelligence and Neuroscience, , Article ID 8680541.
Abstract: Despite the availability of accurate, commercial gaze tracker devices working with infrared (IR) technology, visible light gaze tracking constitutes an interesting alternative by allowing scalability and removing hardware requirements. Over the last years, this field has seen examples of research showing performance comparable to the IR alternatives. In this work, we survey the previous work on remote, visible light gaze trackers and analyze the explored techniques from various perspectives such as calibration strategies, head pose invariance, and gaze estimation techniques. We also provide information on related aspects of research such as public datasets to test against, open source projects to build upon, and gaze tracking services to directly use in applications. With all this information, we aim to provide the contemporary and future researchers with a map detailing previously explored ideas and the required tools.
|
|
|
C. Alejandro Parraga, & Arash Akbarinia. (2016). NICE: A Computational Solution to Close the Gap from Colour Perception to Colour Categorization. Plos - PLoS One, 11(3), e0149538.
Abstract: The segmentation of visible electromagnetic radiation into chromatic categories by the human visual system has been extensively studied from a perceptual point of view, resulting in several colour appearance models. However, there is currently a void when it comes to relate these results to the physiological mechanisms that are known to shape the pre-cortical and cortical visual pathway. This work intends to begin to fill this void by proposing a new physiologically plausible model of colour categorization based on Neural Isoresponsive Colour Ellipsoids (NICE) in the cone-contrast space defined by the main directions of the visual signals entering the visual cortex. The model was adjusted to fit psychophysical measures that concentrate on the categorical boundaries and are consistent with the ellipsoidal isoresponse surfaces of visual cortical neurons. By revealing the shape of such categorical colour regions, our measures allow for a more precise and parsimonious description, connecting well-known early visual processing mechanisms to the less understood phenomenon of colour categorization. To test the feasibility of our method we applied it to exemplary images and a popular ground-truth chart obtaining labelling results that are better than those of current state-of-the-art algorithms.
|
|
|
Gloria Fernandez Esparrach, Jorge Bernal, Maria Lopez Ceron, Henry Cordova, Cristina Sanchez Montes, Cristina Rodriguez de Miguel, et al. (2016). Exploring the clinical potential of an automatic colonic polyp detection method based on the creation of energy maps. END - Endoscopy, 48(9), 837–842.
Abstract: Background and aims: Polyp miss-rate is a drawback of colonoscopy that increases significantly in small polyps. We explored the efficacy of an automatic computer vision method for polyp detection.
Methods: Our method relies on a model that defines polyp boundaries as valleys of image intensity. Valley information is integrated into energy maps which represent the likelihood of polyp presence.
Results: In 24 videos containing polyps from routine colonoscopies, all polyps were detected in at least one frame. Mean values of the maximum of energy map were higher in frames with polyps than without (p<0.001). Performance improved in high quality frames (AUC= 0.79, 95%CI: 0.70-0.87 vs 0.75, 95%CI: 0.66-0.83). Using 3.75 as maximum threshold value, sensitivity and specificity for detection of polyps were 70.4% (95%CI: 60.3-80.8) and 72.4% (95%CI: 61.6-84.6), respectively.
Conclusion: Energy maps showed a good performance for colonic polyp detection. This indicates a potential applicability in clinical practice.
|
|
|
Francesco Ciompi, Simone Balocco, Juan Rigla, Xavier Carrillo, J. Mauri, & Petia Radeva. (2016). Computer-Aided Detection of Intra-Coronary Stent in Intravascular Ultrasound Sequences. MP - Medical Physics, 43(10).
Abstract: Purpose: An intraluminal coronary stent is a metal mesh tube deployed in a stenotic artery during Percutaneous Coronary Intervention (PCI), in order to prevent acute vessel occlusion. The identication of struts location and the denition of the stent shape are relevant for PCI planning 15 and for patient follow-up. We present a fully-automatic framework for Computer-Aided Detection
(CAD) of intra-coronary stents in Intravascular Ultrasound (IVUS) image sequences. The CAD system is able to detect stent struts and estimate the stent shape.
Methods: The proposed CAD uses machine learning to provide a comprehensive interpretation of the local structure of the vessel by means of semantic classication. The output of the classication 20 stage is then used to detect struts and to estimate the stent shape. The proposed approach is validated using a multi-centric data-set of 1,015 images from 107 IVUS sequences containing both metallic and bio-absorbable stents.
Results: The method was able to detect structs in both metallic stents with an overall F-measure of 77.7% and a mean distance of 0.15 mm from manually annotated struts, and in bio-absorbable 25 stents with an overall F-measure of 77.4% and a mean distance of 0.09 mm from manually annotated struts.
Conclusions: The results are close to the inter-observer variability and suggest that the system has the potential of being used as method for aiding percutaneous interventions.
|
|
|
Jean-Pascal Jacob, Mariella Dimiccoli, & L. Moisan. (2017). Active skeleton for bacteria modelling. CMBBE - Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 5(4), 274–286.
Abstract: The investigation of spatio-temporal dynamics of bacterial cells and their molecular components requires automated image analysis tools to track cell shape properties and molecular component locations inside the cells. In the study of bacteria aging, the molecular components of interest are protein aggregates accumulated near bacteria boundaries. This particular location makes very ambiguous the correspondence between aggregates and cells, since computing accurately bacteria boundaries in phase-contrast time-lapse imaging is a challenging task. This paper proposes an active skeleton formulation for bacteria modelling which provides several advantages: an easy computation of shape properties (perimeter, length, thickness and orientation), an improved boundary accuracy in noisy images and a natural bacteria-centred coordinate system that permits the intrinsic location of molecular components inside the cell. Starting from an initial skeleton estimate, the medial axis of the bacterium is obtained by minimising an energy function which incorporates bacteria shape constraints. Experimental results on biological images and comparative evaluation of the performances validate the proposed approach for modelling cigar-shaped bacteria like Escherichia coli. The Image-J plugin of the proposed method can be found online at http://fluobactracker.inrialpes.fr.
|
|
|
A.S. Coquel, Jean-Pascal Jacob, M. Primet, A. Demarez, Mariella Dimiccoli, T. Julou, et al. (2013). Localization of protein aggregation in Escherichia coli is governed by diffusion and nucleoid macromolecular crowding effect. PCB - Plos Computational Biology, 9(4).
Abstract: Aggregates of misfolded proteins are a hallmark of many age-related diseases. Recently, they have been linked to aging of Escherichia coli (E. coli) where protein aggregates accumulate at the old pole region of the aging bacterium. Because of the potential of E. coli as a model organism, elucidating aging and protein aggregation in this bacterium may pave the way to significant advances in our global understanding of aging. A first obstacle along this path is to decipher the mechanisms by which protein aggregates are targeted to specific intercellular locations. Here, using an integrated approach based on individual-based modeling, time-lapse fluorescence microscopy and automated image analysis, we show that the movement of aging-related protein aggregates in E. coli is purely diffusive (Brownian). Using single-particle tracking of protein aggregates in live E. coli cells, we estimated the average size and diffusion constant of the aggregates. Our results provide evidence that the aggregates passively diffuse within the cell, with diffusion constants that depend on their size in agreement with the Stokes-Einstein law. However, the aggregate displacements along the cell long axis are confined to a region that roughly corresponds to the nucleoid-free space in the cell pole, thus confirming the importance of increased macromolecular crowding in the nucleoids. We thus used 3D individual-based modeling to show that these three ingredients (diffusion, aggregation and diffusion hindrance in the nucleoids) are sufficient and necessary to reproduce the available experimental data on aggregate localization in the cells. Taken together, our results strongly support the hypothesis that the localization of aging-related protein aggregates in the poles of E. coli results from the coupling of passive diffusion-aggregation with spatially non-homogeneous macromolecular crowding. They further support the importance of “soft” intracellular structuring (based on macromolecular crowding) in diffusion-based protein localization in E. coli.
|
|
|
Sumit K. Banchhor, Tadashi Araki, Narendra D. Londhe, Nobutaka Ikeda, Petia Radeva, Ayman El-Baz, et al. (2016). Five multiresolution-based calcium volume measurement techniques from coronary IVUS videos: A comparative approach. CMPB - Computer Methods and Programs in Biomedicine, 134, 237–258.
Abstract: BACKGROUND AND OBJECTIVE:
Fast intravascular ultrasound (IVUS) video processing is required for calcium volume computation during the planning phase of percutaneous coronary interventional (PCI) procedures. Nonlinear multiresolution techniques are generally applied to improve the processing time by down-sampling the video frames.
METHODS:
This paper presents four different segmentation methods for calcium volume measurement, namely Threshold-based, Fuzzy c-Means (FCM), K-means, and Hidden Markov Random Field (HMRF) embedded with five different kinds of multiresolution techniques (bilinear, bicubic, wavelet, Lanczos, and Gaussian pyramid). This leads to 20 different kinds of combinations. IVUS image data sets consisting of 38,760 IVUS frames taken from 19 patients were collected using 40 MHz IVUS catheter (Atlantis® SR Pro, Boston Scientific®, pullback speed of 0.5 mm/sec.). The performance of these 20 systems is compared with and without multiresolution using the following metrics: (a) computational time; (b) calcium volume; (c) image quality degradation ratio; and (d) quality assessment ratio.
RESULTS:
Among the four segmentation methods embedded with five kinds of multiresolution techniques, FCM segmentation combined with wavelet-based multiresolution gave the best performance. FCM and wavelet experienced the highest percentage mean improvement in computational time of 77.15% and 74.07%, respectively. Wavelet interpolation experiences the highest mean precision-of-merit (PoM) of 94.06 ± 3.64% and 81.34 ± 16.29% as compared to other multiresolution techniques for volume level and frame level respectively. Wavelet multiresolution technique also experiences the highest Jaccard Index and Dice Similarity of 0.7 and 0.8, respectively. Multiresolution is a nonlinear operation which introduces bias and thus degrades the image. The proposed system also provides a bias correction approach to enrich the system, giving a better mean calcium volume similarity for all the multiresolution-based segmentation methods. After including the bias correction, bicubic interpolation gives the largest increase in mean calcium volume similarity of 4.13% compared to the rest of the multiresolution techniques. The system is automated and can be adapted in clinical settings.
CONCLUSIONS:
We demonstrated the time improvement in calcium volume computation without compromising the quality of IVUS image. Among the 20 different combinations of multiresolution with calcium volume segmentation methods, the FCM embedded with wavelet-based multiresolution gave the best performance.
|
|
|
Xavier Perez Sala, Fernando De la Torre, Laura Igual, Sergio Escalera, & Cecilio Angulo. (2017). Subspace Procrustes Analysis. IJCV - International Journal of Computer Vision, 121(3), 327–343.
Abstract: Procrustes Analysis (PA) has been a popular technique to align and build 2-D statistical models of shapes. Given a set of 2-D shapes PA is applied to remove rigid transformations. Then, a non-rigid 2-D model is computed by modeling (e.g., PCA) the residual. Although PA has been widely used, it has several limitations for modeling 2-D shapes: occluded landmarks and missing data can result in local minima solutions, and there is no guarantee that the 2-D shapes provide a uniform sampling of the 3-D space of rotations for the object. To address previous issues, this paper proposes Subspace PA (SPA). Given several
instances of a 3-D object, SPA computes the mean and a 2-D subspace that can simultaneously model all rigid and non-rigid deformations of the 3-D object. We propose a discrete (DSPA) and continuous (CSPA) formulation for SPA, assuming that 3-D samples of an object are provided. DSPA extends the traditional PA, and produces unbiased 2-D models by uniformly sampling different views of the 3-D object. CSPA provides a continuous approach to uniformly sample the space of 3-D rotations, being more efficient in space and time. Experiments using SPA to learn 2-D models of bodies from motion capture data illustrate the benefits of our approach.
|
|
|
Frederic Sampedro, Anna Domenech, Sergio Escalera, & Ignasi Carrio. (2017). Computing quantitative indicators of structural renal damage in pediatric DMSA scans. REMNIM - Revista Española de Medicina Nuclear e Imagen Molecular, 36(2), 72–77.
Abstract: OBJECTIVES:
The proposal and implementation of a computational framework for the quantification of structural renal damage from 99mTc-dimercaptosuccinic acid (DMSA) scans. The aim of this work is to propose, implement, and validate a computational framework for the quantification of structural renal damage from DMSA scans and in an observer-independent manner.
MATERIALS AND METHODS:
From a set of 16 pediatric DMSA-positive scans and 16 matched controls and using both expert-guided and automatic approaches, a set of image-derived quantitative indicators was computed based on the relative size, intensity and histogram distribution of the lesion. A correlation analysis was conducted in order to investigate the association of these indicators with other clinical data of interest in this scenario, including C-reactive protein (CRP), white cell count, vesicoureteral reflux, fever, relative perfusion, and the presence of renal sequelae in a 6-month follow-up DMSA scan.
RESULTS:
A fully automatic lesion detection and segmentation system was able to successfully classify DMSA-positive from negative scans (AUC=0.92, sensitivity=81% and specificity=94%). The image-computed relative size of the lesion correlated with the presence of fever and CRP levels (p<0.05), and a measurement derived from the distribution histogram of the lesion obtained significant performance results in the detection of permanent renal damage (AUC=0.86, sensitivity=100% and specificity=75%).
CONCLUSIONS:
The proposal and implementation of a computational framework for the quantification of structural renal damage from DMSA scans showed a promising potential to complement visual diagnosis and non-imaging indicators.
|
|
|
Mikkel Thogersen, Sergio Escalera, Jordi Gonzalez, & Thomas B. Moeslund. (2016). Segmentation of RGB-D Indoor scenes by Stacking Random Forests and Conditional Random Fields. PRL - Pattern Recognition Letters, 80, 208–215.
Abstract: This paper proposes a technique for RGB-D scene segmentation using Multi-class
Multi-scale Stacked Sequential Learning (MMSSL) paradigm. Following recent trends in state-of-the-art, a base classifier uses an initial SLIC segmentation to obtain superpixels which provide a diminution of data while retaining object boundaries. A series of color and depth features are extracted from the superpixels, and are used in a Conditional Random Field (CRF) to predict superpixel labels. Furthermore, a Random Forest (RF) classifier using random offset features is also used as an input to the CRF, acting as an initial prediction. As a stacked classifier, another Random Forest is used acting on a spatial multi-scale decomposition of the CRF confidence map to correct the erroneous labels assigned by the previous classifier. The model is tested on the popular NYU-v2 dataset.
The approach shows that simple multi-modal features with the power of the MMSSL
paradigm can achieve better performance than state of the art results on the same dataset.
|
|
|
Jose Garcia-Rodriguez, Isabelle Guyon, Sergio Escalera, Alexandra Psarrou, Andrew Lewis, & Miguel Cazorla. (2017). Editorial: Special Issue on Computational Intelligence for Vision and Robotics. Neural Computing and Applications - Neural Computing and Applications, 28(5), 853–854.
|
|
|
Sergio Escalera, Jordi Gonzalez, Xavier Baro, & Jamie Shotton. (2016). Guest Editor Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1489–1491.
Abstract: The sixteen papers in this special section focus on human pose recovery and behavior analysis (HuPBA). This is one of the most challenging topics in computer vision, pattern analysis, and machine learning. It is of critical importance for application areas that include gaming, computer interaction, human robot interaction, security, commerce, assistive technologies and rehabilitation, sports, sign language recognition, and driver assistance technology, to mention just a few. In essence, HuPBA requires dealing with the articulated nature of the human body, changes in appearance due to clothing, and the inherent problems of clutter scenes, such as background artifacts, occlusions, and illumination changes. These papers represent the most recent research in this field, including new methods considering still images, image sequences, depth data, stereo vision, 3D vision, audio, and IMUs, among others.
|
|
|
Marc Oliu, Ciprian Corneanu, Kamal Nasrollahi, Olegs Nikisins, Sergio Escalera, Yunlian Sun, et al. (2016). Improved RGB-D-T based Face Recognition. BIO - IET Biometrics, 5(4), 297–303.
Abstract: Reliable facial recognition systems are of crucial importance in various applications from entertainment to security. Thanks to the deep-learning concepts introduced in the field, a significant improvement in the performance of the unimodal facial recognition systems has been observed in the recent years. At the same time a multimodal facial recognition is a promising approach. This study combines the latest successes in both directions by applying deep learning convolutional neural networks (CNN) to the multimodal RGB, depth, and thermal (RGB-D-T) based facial recognition problem outperforming previously published results. Furthermore, a late fusion of the CNN-based recognition block with various hand-crafted features (local binary patterns, histograms of oriented gradients, Haar-like rectangular features, histograms of Gabor ordinal measures) is introduced, demonstrating even better recognition performance on a benchmark RGB-D-T database. The obtained results in this study show that the classical engineered features and CNN-based features can complement each other for recognition purposes.
|
|
|
Ariel Amato. (2014). Moving cast shadow detection. ELCVIA - Electronic letters on computer vision and image analysis, 13(2), 70–71.
Abstract: Motion perception is an amazing innate ability of the creatures on the planet. This adroitness entails a functional advantage that enables species to compete better in the wild. The motion perception ability is usually employed at different levels, allowing from the simplest interaction with the ’physis’ up to the most transcendental survival tasks. Among the five classical perception system , vision is the most widely used in the motion perception field. Millions years of evolution have led to a highly specialized visual system in humans, which is characterized by a tremendous accuracy as well as an extraordinary robustness. Although humans and an immense diversity of species can distinguish moving object with a seeming simplicity, it has proven to be a difficult and non trivial problem from a computational perspective. In the field of Computer Vision, the detection of moving objects is a challenging and fundamental research area. This can be referred to as the ’origin’ of vast and numerous vision-based research sub-areas. Nevertheless, from the bottom to the top of this hierarchical analysis, the foundations still relies on when and where motion has occurred in an image. Pixels corresponding to moving objects in image sequences can be identified by measuring changes in their values. However, a pixel’s value (representing a combination of color and brightness) could also vary due to other factors such as: variation in scene illumination, camera noise and nonlinear sensor responses among others. The challenge lies in detecting if the changes in pixels’ value are caused by a genuine object movement or not. An additional challenging aspect in motion detection is represented by moving cast shadows. The paradox arises because a moving object and its cast shadow share similar motion patterns. However, a moving cast shadow is not a moving object. In fact, a shadow represents a photometric illumination effect caused by the relative position of the object with respect to the light sources. Shadow detection methods are mainly divided in two domains depending on the application field. One normally consists of static images where shadows are casted by static objects, whereas the second one is referred to image sequences where shadows are casted by moving objects. For the first case, shadows can provide additional geometric and semantic cues about shape and position of its casting object as well as the localization of the light source. Although the previous information can be extracted from static images as well as video sequences, the main focus in the second area is usually change detection, scene matching or surveillance. In this context, a shadow can severely affect with the analysis and interpretation of the scene. The work done in the thesis is focused on the second case, thus it addresses the problem of detection and removal of moving cast shadows in video sequences in order to enhance the detection of moving object.
|
|