Murad Al Haj. (2013). Looking at Faces: Detection, Tracking and Pose Estimation (Jordi Gonzalez, & Xavier Roca, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Humans can effortlessly perceive faces, follow them over space and time, and decode their rich content, such as pose, identity and expression. However, despite many decades of research on automatic facial perception in areas like face detection, expression recognition, pose estimation and face recognition, and despite many successes, a complete solution remains elusive. This thesis is dedicated to three problems in automatic face perception, namely face detection, face tracking and pose estimation.
In face detection, an initial simple model is presented that uses pixel-based heuristics to segment skin locations and hand-crafted rules to determine the locations of the faces present in an image. Different colorspaces are studied to judge whether a colorspace transformation can aid skin color detection. The output of this study is used in the design of a more complex face detector that is able to successfully generalize to different scenarios.
In face tracking, a framework that combines estimation and control in a joint scheme is presented to track a face with a single pan-tilt-zoom camera. While this work is mainly motivated by tracking faces, it can be easily applied atop of any detector to track different objects. The applicability of this method is demonstrated on simulated as well as real-life scenarios.
The last and most important part of this thesis is dedicate to monocular head pose estimation. In this part, a method based on partial least squares (PLS) regression is proposed to estimate pose and solve the alignment problem simultaneously. The contributions of this work are two-fold: 1) demonstrating that the proposed method achieves better than state-of-the-art results on the estimation problem and 2) developing a technique to reduce misalignment based on the learned PLS factors that outperform multiple instance learning (MIL) without the need for any re-training or the inclusion of misaligned samples in the training process, as normally done in MIL.
|
Albert Gordo. (2013). Document Image Representation, Classification and Retrieval in Large-Scale Domains (Ernest Valveny, & Florent Perronnin, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Despite the “paperless office” ideal that started in the decade of the seventies, businesses still strive against an increasing amount of paper documentation. Companies still receive huge amounts of paper documentation that need to be analyzed and processed, mostly in a manual way. A solution for this task consists in, first, automatically scanning the incoming documents. Then, document images can be analyzed and information can be extracted from the data. Documents can also be automatically dispatched to the appropriate workflows, used to retrieve similar documents in the dataset to transfer information, etc.
Due to the nature of this “digital mailroom”, we need document representation methods to be general, i.e., able to cope with very different types of documents. We need the methods to be sound, i.e., able to cope with unexpected types of documents, noise, etc. And, we need to methods to be scalable, i.e., able to cope with thousands or millions of documents that need to be processed, stored, and consulted. Unfortunately, current techniques of document representation, classification and retrieval are not apt for this digital mailroom framework, since they do not fulfill some or all of these requirements.
Through this thesis we focus on the problem of document representation aimed at classification and retrieval tasks under this digital mailroom framework. We first propose a novel document representation based on runlength histograms, and extend it to cope with more complex documents such as multiple-page documents, or documents that contain more sources of information such as extracted OCR text. Then we focus on the scalability requirements and propose a novel binarization method which we dubbed PCAE, as well as two general asymmetric distances between binary embeddings that can significantly improve the retrieval results at a minimal extra computational cost. Finally, we note the importance of supervised learning when performing large-scale retrieval, and study several approaches that can significantly boost the results at no extra cost at query time.
|
Shida Beigpour. (2013). Illumination and object reflectance modeling (Joost Van de Weijer, & Ernest Valveny, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: More realistic and accurate models of the scene illumination and object reflectance can greatly improve the quality of many computer vision and computer graphics tasks. Using such model, a more profound knowledge about the interaction of light with object surfaces can be established which proves crucial to a variety of computer vision applications. In the current work, we investigate the various existing approaches to illumination and reflectance modeling and form an analysis on their shortcomings in capturing the complexity of real-world scenes. Based on this analysis we propose improvements to different aspects of reflectance and illumination estimation in order to more realistically model the real-world scenes in the presence of complex lighting phenomena (i.e, multiple illuminants, interreflections and shadows). Moreover, we captured our own multi-illuminant dataset which consists of complex scenes and illumination conditions both outdoor and in laboratory conditions. In addition we investigate the use of synthetic data to facilitate the construction of datasets and improve the process of obtaining ground-truth information.
|
Ferran Poveda. (2013). Computer Graphics and Vision Techniques for the Study of the Muscular Fiber Architecture of the Myocardium (Debora Gil, & Enric Marti, Eds.). Ph.D. thesis, , .
|
Naveen Onkarappa. (2013). Optical Flow in Driver Assistance Systems (Angel Sappa, Ed.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Motion perception is one of the most important attributes of the human brain. Visual motion perception consists in inferring speed and direction of elements in a scene based on visual inputs. Analogously, computer vision is assisted by motion cues in the scene. Motion detection in computer vision is useful in solving problems such as segmentation, depth from motion, structure from motion, compression, navigation and many others. These problems are common in several applications, for instance, video surveillance, robot navigation and advanced driver assistance systems (ADAS). One of the most widely used techniques for motion detection is the optical flow estimation. The work in this thesis attempts to make optical flow suitable for the requirements and conditions of driving scenarios. In this context, a novel space-variant representation called reverse log-polar representation is proposed that is shown to be better than the traditional log-polar space-variant representation for ADAS. The space-variant representations reduce the amount of data to be processed. Another major contribution in this research is related to the analysis of the influence of specific characteristics from driving scenarios on the optical flow accuracy. Characteristics such as vehicle speed and
road texture are considered in the aforementioned analysis. From this study, it is inferred that the regularization weight has to be adapted according to the required error measure and for different speeds and road textures. It is also shown that polar represented optical flow suits driving scenarios where predominant motion is translation. Due to the requirements of such a study and by the lack of needed datasets a new synthetic dataset is presented; it contains: i) sequences of different speeds and road textures in an urban scenario; ii) sequences with complex motion of an on-board camera; and iii) sequences with additional moving vehicles in the scene. The ground-truth optical flow is generated by the ray-tracing technique. Further, few applications of optical flow in ADAS are shown. Firstly, a robust RANSAC based technique to estimate horizon line is proposed. Then, an egomotion estimation is presented to compare the proposed space-variant representation with the classical one. As a final contribution, a modification in the regularization term is proposed that notably improves the results
in the ADAS applications. This adaptation is evaluated using a state of the art optical flow technique. The experiments on a public dataset (KITTI) validate the advantages of using the proposed modification.
|
David Vazquez. (2013). Domain Adaptation of Virtual and Real Worlds for Pedestrian Detection (Antonio Lopez, & Daniel Ponsa, Eds.) (Vol. 1). Ph.D. thesis, Ediciones Graficas Rey, Barcelona.
Abstract: Pedestrian detection is of paramount interest for many applications, e.g. Advanced Driver Assistance Systems, Intelligent Video Surveillance and Multimedia systems. Most promising pedestrian detectors rely on appearance-based classifiers trained with annotated data. However, the required annotation step represents an intensive and subjective task for humans, what makes worth to minimize their intervention in this process by using computational tools like realistic virtual worlds. The reason to use these kind of tools relies in the fact that they allow the automatic generation of precise and rich annotations of visual information. Nevertheless, the use of this kind of data comes with the following question: can a pedestrian appearance model learnt with virtual-world data work successfully for pedestrian detection in real-world scenarios?. To answer this question, we conduct different experiments that suggest a positive answer. However, the pedestrian classifiers trained with virtual-world data can suffer the so called dataset shift problem as real-world based classifiers does. Accordingly, we have designed different domain adaptation techniques to face this problem, all of them integrated in a same framework (V-AYLA). We have explored different methods to train a domain adapted pedestrian classifiers by collecting a few pedestrian samples from the target domain (real world) and combining them with many samples of the source domain (virtual world). The extensive experiments we present show that pedestrian detectors developed within the V-AYLA framework do achieve domain adaptation. Ideally, we would like to adapt our system without any human intervention. Therefore, as a first proof of concept we also propose an unsupervised domain adaptation technique that avoids human intervention during the adaptation process. To the best of our knowledge, this Thesis work is the first demonstrating adaptation of virtual and real worlds for developing an object detector. Last but not least, we also assessed a different strategy to avoid the dataset shift that consists in collecting real-world samples and retrain with them in such a way that no bounding boxes of real-world pedestrians have to be provided. We show that the generated classifier is competitive with respect to the counterpart trained with samples collected by manually annotating pedestrian bounding boxes. The results presented on this Thesis not only end with a proposal for adapting a virtual-world pedestrian detector to the real world, but also it goes further by pointing out a new methodology that would allow the system to adapt to different situations, which we hope will provide the foundations for future research in this unexplored area.
Keywords: Pedestrian Detection; Domain Adaptation
|
Marina Alberti. (2013). Detection and Alignment of Vascular Structures in Intravascular Ultrasound using Pattern Recognition Techniques (Simone Balocco, & Petia Radeva, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: In this thesis, several methods for the automatic analysis of Intravascular Ultrasound
(IVUS) sequences are presented, aimed at assisting physicians in the diagnosis, the assessment of the intervention and the monitoring of the patients with coronary disease.
The basis for the developed frameworks are machine learning, pattern recognition and
image processing techniques.
First, a novel approach for the automatic detection of vascular bifurcations in
IVUS is presented. The task is addressed as a binary classication problem (identifying bifurcation and non-bifurcation angular sectors in the sequence images). The
multiscale stacked sequential learning algorithm is applied, to take into account the
spatial and temporal context in IVUS sequences, and the results are rened using
a-priori information about branching dimensions and geometry. The achieved performance is comparable to intra- and inter-observer variability.
Then, we propose a novel method for the automatic non-rigid alignment of IVUS
sequences of the same patient, acquired at dierent moments (before and after percutaneous coronary intervention, or at baseline and follow-up examinations). The
method is based on the description of the morphological content of the vessel, obtained by extracting temporal morphological proles from the IVUS acquisitions, by
means of methods for segmentation, characterization and detection in IVUS. A technique for non-rigid sequence alignment – the Dynamic Time Warping algorithm -
is applied to the proles and adapted to the specic clinical problem. Two dierent robust strategies are proposed to address the partial overlapping between frames
of corresponding sequences, and a regularization term is introduced to compensate
for possible errors in the prole extraction. The benets of the proposed strategy
are demonstrated by extensive validation on synthetic and in-vivo data. The results
show the interest of the proposed non-linear alignment and the clinical value of the
method.
Finally, a novel automatic approach for the extraction of the luminal border in
IVUS images is presented. The method applies the multiscale stacked sequential
learning algorithm and extends it to 2-D+T, in a rst classication phase (the identi-
cation of lumen and non-lumen regions of the images), while an active contour model
is used in a second phase, to identify the lumen contour. The method is extended
to the longitudinal dimension of the sequences and it is validated on a challenging
data-set.
|
A.S. Coquel, Jean-Pascal Jacob, M. Primet, A. Demarez, Mariella Dimiccoli, T. Julou, et al. (2013). Localization of protein aggregation in Escherichia coli is governed by diffusion and nucleoid macromolecular crowding effect. PCB - Plos Computational Biology, 9(4).
Abstract: Aggregates of misfolded proteins are a hallmark of many age-related diseases. Recently, they have been linked to aging of Escherichia coli (E. coli) where protein aggregates accumulate at the old pole region of the aging bacterium. Because of the potential of E. coli as a model organism, elucidating aging and protein aggregation in this bacterium may pave the way to significant advances in our global understanding of aging. A first obstacle along this path is to decipher the mechanisms by which protein aggregates are targeted to specific intercellular locations. Here, using an integrated approach based on individual-based modeling, time-lapse fluorescence microscopy and automated image analysis, we show that the movement of aging-related protein aggregates in E. coli is purely diffusive (Brownian). Using single-particle tracking of protein aggregates in live E. coli cells, we estimated the average size and diffusion constant of the aggregates. Our results provide evidence that the aggregates passively diffuse within the cell, with diffusion constants that depend on their size in agreement with the Stokes-Einstein law. However, the aggregate displacements along the cell long axis are confined to a region that roughly corresponds to the nucleoid-free space in the cell pole, thus confirming the importance of increased macromolecular crowding in the nucleoids. We thus used 3D individual-based modeling to show that these three ingredients (diffusion, aggregation and diffusion hindrance in the nucleoids) are sufficient and necessary to reproduce the available experimental data on aggregate localization in the cells. Taken together, our results strongly support the hypothesis that the localization of aging-related protein aggregates in the poles of E. coli results from the coupling of passive diffusion-aggregation with spatially non-homogeneous macromolecular crowding. They further support the importance of “soft” intracellular structuring (based on macromolecular crowding) in diffusion-based protein localization in E. coli.
|
Olivier Penacchio, Xavier Otazu, & Laura Dempere-Marco. (2013). A Neurodynamical Model of Brightness Induction in V1. Plos - PloS ONE, 8(5), e64086.
Abstract: Brightness induction is the modulation of the perceived intensity of an area by the luminance of surrounding areas. Recent neurophysiological evidence suggests that brightness information might be explicitly represented in V1, in contrast to the more common assumption that the striate cortex is an area mostly responsive to sensory information. Here we investigate possible neural mechanisms that offer a plausible explanation for such phenomenon. To this end, a neurodynamical model which is based on neurophysiological evidence and focuses on the part of V1 responsible for contextual influences is presented. The proposed computational model successfully accounts for well known psychophysical effects for static contexts and also for brightness induction in dynamic contexts defined by modulating the luminance of surrounding areas. This work suggests that intra-cortical interactions in V1 could, at least partially, explain brightness induction effects and reveals how a common general architecture may account for several different fundamental processes, such as visual saliency and brightness induction, which emerge early in the visual processing pathway.
|
German Ros, J. Guerrero, Angel Sappa, & Antonio Lopez. (2013). VSLAM pose initialization via Lie groups and Lie algebras optimization. In Proceedings of IEEE International Conference on Robotics and Automation (pp. 5740–5747).
Abstract: We present a novel technique for estimating initial 3D poses in the context of localization and Visual SLAM problems. The presented approach can deal with noise, outliers and a large amount of input data and still performs in real time in a standard CPU. Our method produces solutions with an accuracy comparable to those produced by RANSAC but can be much faster when the percentage of outliers is high or for large amounts of input data. On the current work we propose to formulate the pose estimation as an optimization problem on Lie groups, considering their manifold structure as well as their associated Lie algebras. This allows us to perform a fast and simple optimization at the same time that conserve all the constraints imposed by the Lie group SE(3). Additionally, we present several key design concepts related with the cost function and its Jacobian; aspects that are critical for the good performance of the algorithm.
Keywords: SLAM
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas, Apostolos Antonacopoulos, & Josep Llados. (2013). An interactive appearance-based document retrieval system for historical newspapers. In Proceedings of the International Conference on Computer Vision Theory and Applications (pp. 84–87).
Abstract: In this paper we present a retrieval-based application aimed at assisting a user to semi-automatically segment an incoming flow of historical newspaper images by automatically detecting a particular type of pages based on their appearance. A visual descriptor is used to assess page similarity while a relevance feedback process allow refining the results iteratively. The application is tested on a large dataset of digitised historic newspapers.
|
Christophe Rigaud, Dimosthenis Karatzas, Joost Van de Weijer, Jean-Christophe Burie, & Jean-Marc Ogier. (2013). Automatic text localisation in scanned comic books. In Proceedings of the International Conference on Computer Vision Theory and Applications (pp. 814–819).
Abstract: Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent document understanding enable direct content-based search as opposed to metadata only search (e.g. album title or author name). Few studies have been done in this direction. In this work we detail a novel approach for the automatic text localization in scanned comics book pages, an essential step towards a fully automatic comics book understanding. We focus on speech text as it is semantically important and represents the majority of the text present in comics. The approach is compared with existing methods of text localization found in the literature and results are presented.
Keywords: Text localization; comics; text/graphic separation; complex background; unstructured document
|
Carles Sanchez, Debora Gil, Antoni Rosell, Albert Andaluz, & F. Javier Sanchez. (2013). Segmentation of Tracheal Rings in Videobronchoscopy combining Geometry and Appearance. In Sebastiano Battiato and José Braz (Ed.), Proceedings of the International Conference on Computer Vision Theory and Applications (Vol. 1, pp. 153–161). LNCS. Portugal: SciTePress.
Abstract: Videobronchoscopy is a medical imaging technique that allows interactive navigation inside the respiratory pathways and minimal invasive interventions. Tracheal procedures are ordinary interventions that require measurement of the percentage of obstructed pathway for injury (stenosis) assessment. Visual assessment of stenosis in videobronchoscopic sequences requires high expertise of trachea anatomy and is prone to human error. Accurate detection of tracheal rings is the basis for automated estimation of the size of stenosed trachea. Processing of videobronchoscopic images acquired at the operating room is a challenging task due to the wide range of artifacts and acquisition conditions. We present a model of the geometric-appearance of tracheal rings for its detection in videobronchoscopic videos. Experiments on sequences acquired at the operating room, show a performance close to inter-observer variability
Keywords: Video-bronchoscopy, tracheal ring segmentation, trachea geometric and appearance model
|
Joan M. Nuñez, Jorge Bernal, F. Javier Sanchez, & Fernando Vilariño. (2013). Blood Vessel Characterization in Colonoscopy Images to Improve Polyp Localization. In Proceedings of the International Conference on Computer Vision Theory and Applications (Vol. 1, pp. 162–171). SciTePress.
Abstract: This paper presents an approach to mitigate the contribution of blood vessels to the energy image used at different tasks of automatic colonoscopy image analysis. This goal is achieved by introducing a characterization of endoluminal scene objects which allows us to differentiate between the trace of 2-dimensional visual objects,such as vessels, and shades from 3-dimensional visual objects, such as folds. The proposed characterization is based on the influence that the object shape has in the resulting visual feature, and it leads to the development of a blood vessel attenuation algorithm. A database consisting of manually labelled masks was built in order to test the performance of our method, which shows an encouraging success in blood vessel mitigation while keeping other structures intact. Moreover, by extending our method to the only available polyp localization
algorithm tested on a public database, blood vessel mitigation proved to have a positive influence on the overall performance.
Keywords: Colonoscopy; Blood vessel; Linear features; Valley detection
|
Mirko Arnold, Anarta Ghosh, Glen Doherty, Hugh Mulcahy, Stephen Patchett, & Gerard Lacey. (2013). Towards Automatic Direct Observation of Procedure and Skill (DOPS) in Colonoscopy. In Proceedings of the International Conference on Computer Vision Theory and Applications (pp. 48–53).
|