toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Michal Drozdzal edit  isbn
openurl 
  Title Sequential image analysis for computer-aided wireless endoscopy Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Wireless Capsule Endoscopy (WCE) is a technique for inner-visualization of the entire small intestine and, thus, offers an interesting perspective on intestinal motility. The two major drawbacks of this technique are: 1) huge amount of data acquired by WCE makes the motility analysis tedious and 2) since the capsule is the first tool that offers complete inner-visualization of the small intestine,the exact importance of the observed events is still an open issue. Therefore, in this thesis, a novel computer-aided system for intestinal motility analysis is presented. The goal of the system is to provide an easily-comprehensible visual description of motility-related intestinal events to a physician. In order to do so, several tools based either on computer vision concepts or on machine learning techniques are presented. A method for transforming 3D video signal to a holistic image of intestinal motility, called motility bar, is proposed. The method calculates the optimal mapping from video into image from the intestinal motility point of view.
To characterize intestinal motility, methods for automatic extraction of motility information from WCE are presented. Two of them are based on the motility bar and two of them are based on frame-per-frame analysis. In particular, four algorithms dealing with the problems of intestinal contraction detection, lumen size estimation, intestinal content characterization and wrinkle frame detection are proposed and validated. The results of the algorithms are converted into sequential features using an online statistical test. This test is designed to work with multivariate data streams. To this end, we propose a novel formulation of concentration inequality that is introduced into a robust adaptive windowing algorithm for multivariate data streams. The algorithm is used to obtain robust representation of segments with constant intestinal motility activity. The obtained sequential features are shown to be discriminative in the problem of abnormal motility characterization.
Finally, we tackle the problem of efficient labeling. To this end, we incorporate active learning concepts to the problems present in WCE data and propose two approaches. The first one is based the concepts of sequential learning and the second one adapts the partition-based active learning to an error-free labeling scheme. All these steps are sufficient to provide an extensive visual description of intestinal motility that can be used by an expert as decision support system.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Petia Radeva  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-3-3 Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number Admin @ si @ Dro2014 Serial 2486  
Permanent link to this record
 

 
Author Antonio Clavelli edit  isbn
openurl 
  Title A computational model of eye guidance, searching for text in real scene images Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Searching for text objects in real scene images is an open problem and a very active computer vision research area. A large number of methods have been proposed tackling the text search as extension of the ones from the document analysis field or inspired by general purpose object detection methods. However the general problem of object search in real scene images remains an extremely challenging problem due to the huge variability in object appearance. This thesis builds on top of the most recent findings in the visual attention literature presenting a novel computational model of eye guidance aiming to better describe text object search in real scene images.
First are presented the relevant state-of-the-art results from the visual attention literature regarding eye movements and visual search. Relevant models of attention are discussed and integrated with recent observations on the role of top-down constraints and the emerging need for a layered model of attention in which saliency is not the only factor guiding attention. Visual attention is then explained by the interaction of several modulating factors, such as objects, value, plans and saliency. Then we introduce our probabilistic formulation of attention deployment in real scene. The model is based on the rationale that oculomotor control depends on two interacting but distinct processes: an attentional process that assigns value to the sources of information and motor process that flexibly links information with action.
In such framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the reward of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects.
In the experimental section the model is tested in laboratory condition, comparing model simulations with data from eye tracking experiments. The comparison is qualitative in terms of observable scan paths and quantitative in terms of statistical similarity of gaze shift amplitude. Experiments are performed using eye tracking data from both a publicly available dataset of face and text and from newly performed eye-tracking experiments on a dataset of street view pictures containing text. The last part of this thesis is dedicated to study the extent to which the proposed model can account for human eye movements in a low constrained setting. We used a mobile eye tracking device and an ad-hoc developed methodology to compare model simulated eye data with the human eye data from mobile eye tracking recordings. Such setting allow to test the model in an incomplete visual information condition, reproducing a close to real-life search task.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Dimosthenis Karatzas;Giuseppe Boccignone;Josep Llados  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-6-4 Medium  
  Area Expedition Conference  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ Cla2014 Serial 2571  
Permanent link to this record
 

 
Author Jon Almazan edit  openurl
  Title Learning to Represent Handwritten Shapes and Words for Matching and Recognition Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Writing is one of the most important forms of communication and for centuries, handwriting had been the most reliable way to preserve knowledge. However, despite the recent development of printing houses and electronic devices, handwriting is still broadly used for taking notes, doing annotations, or sketching ideas.
Transferring the ability of understanding handwritten text or recognizing handwritten shapes to computers has been the goal of many researches due to its huge importance for many different fields. However, designing good representations to deal with handwritten shapes, e.g. symbols or words, is a very challenging problem due to the large variability of these kinds of shapes. One of the consequences of working with handwritten shapes is that we need representations to be robust, i.e., able to adapt to large intra-class variability. We need representations to be discriminative, i.e., able to learn what are the differences between classes. And, we need representations to be efficient, i.e., able to be rapidly computed and compared. Unfortunately, current techniques of handwritten shape representation for matching and recognition do not fulfill some or all of these requirements.
Through this thesis we focus on the problem of learning to represent handwritten shapes aimed at retrieval and recognition tasks. Concretely, on the first part of the thesis, we focus on the general problem of representing any kind of handwritten shape. We first present a novel shape descriptor based on a deformable grid that deals with large deformations by adapting to the shape and where the cells of the grid can be used to extract different features. Then, we propose to use this descriptor to learn statistical models, based on the Active Appearance Model, that jointly learns the variability in structure and texture of a given class. Then, on the second part, we focus on a concrete application, the problem of representing handwritten words, for the tasks of word spotting, where the goal is to find all instances of a query word in a dataset of images, and recognition. First, we address the segmentation-free problem and propose an unsupervised, sliding-window-based approach that achieves state-of- the-art results in two public datasets. Second, we address the more challenging multi-writer problem, where the variability in words exponentially increases. We describe an approach in which both word images and text strings are embedded in a common vectorial subspace, and where those that represent the same word are close together. This is achieved by a combination of label embedding and attributes learning, and a common subspace regression. This leads to a low-dimensional, unified representation of word images and strings, resulting in a method that allows one to perform either image and text searches, as well as image transcription, in a unified framework. We evaluate our methods on different public datasets of both handwritten documents and natural images showing results comparable or better than the state-of-the-art on spotting and recognition tasks.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Ernest Valveny;Alicia Fornes  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ Alm2014 Serial 2572  
Permanent link to this record
 

 
Author David Fernandez edit  isbn
openurl 
  Title Contextual Word Spotting in Historical Handwritten Documents Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract There are countless collections of historical documents in archives and libraries that contain plenty of valuable information for historians and researchers. The extraction of this information has become a central task among the Document Analysis researches and practitioners.
There is an increasing interest to digital preserve and provide access to these kind of documents. But only the digitalization is not enough for the researchers. The extraction and/or indexation of information of this documents has had an increased interest among researchers. In many cases, and in particular in historical manuscripts, the full transcription of these documents is extremely dicult due the inherent de ciencies: poor physical preservation, di erent writing styles, obsolete languages, etc. Word spotting has become a popular an ecient alternative to full transcription. It inherently involves a high level of degradation in the images. The search of words is holistically
formulated as a visual search of a given query shape in a larger image, instead of recognising the input text and searching the query word with an ascii string comparison. But the performance of classical word spotting approaches depend on the degradation level of the images being unacceptable in many cases . In this thesis we have proposed a novel paradigm called contextual word spotting method that uses the contextual/semantic information to achieve acceptable results whereas classical word spotting does not reach. The contextual word spotting framework proposed in this thesis is a segmentation-based word spotting approach, so an ecient word segmentation is needed. Historical handwritten
documents present some common diculties that can increase the diculties the extraction of the words. We have proposed a line segmentation approach that formulates the problem as nding the central part path in the area between two consecutive lines. This is solved as a graph traversal problem. A path nding algorithm is used to nd the optimal path in a graph, previously computed, between the text lines. Once the text lines are extracted, words are localized inside the text lines using a word segmentation technique from the state of the
art. Classical word spotting approaches can be improved using the contextual information of the documents. We have introduced a new framework, oriented to handwritten documents that present a highly structure, to extract information making use of context. The framework is an ecient tool for semi-automatic transcription that uses the contextual information to achieve better results than classical word spotting approaches. The contextual information is
automatically discovered by recognizing repetitive structures and categorizing all the words according to semantic classes. The most frequent words in each semantic cluster are extracted and the same text is used to transcribe all them. The experimental results achieved in this thesis outperform classical word spotting approaches demonstrating the suitability of the proposed ensemble architecture for spotting words in historical handwritten documents using contextual information.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Josep Llados;Alicia Fornes  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-7-1 Medium  
  Area Expedition Conference  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ Fer2014 Serial 2573  
Permanent link to this record
 

 
Author Lluis Pere de las Heras edit  isbn
openurl 
  Title Relational Models for Visual Understanding of Graphical Documents. Application to Architectural Drawings. Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Graphical documents express complex concepts using a visual language. This language consists of a vocabulary (symbols) and a syntax (structural relations between symbols) that articulate a semantic meaning in a certain context. Therefore, the automatic interpretation by computers of these sort of documents entails three main steps: the detection of the symbols, the extraction of the structural relations between these symbols, and the modeling of the knowledge that permits the extraction of the semantics. Di erent domains in graphical documents include: architectural and engineering drawings, maps, owcharts, etc.
Graphics Recognition in particular and Document Image Analysis in general are
born from the industrial need of interpreting a massive amount of digitalized documents after the emergence of the scanner. Although many years have passed, the graphical document understanding problem still seems to be far from being solved. The main reason is that the vast majority of the systems in the literature focus on very speci c problems, where the domain of the document dictates the implementation of the interpretation. As a result, it is dicult to reuse these strategies on di erent data and on di erent contexts, hindering thus the natural progress in the eld.
In this thesis, we face the graphical document understanding problem by proposing several relational models at di erent levels that are designed from a generic perspective. Firstly, we introduce three di erent strategies for the detection of symbols. The fi rst method tackles the problem structurally, wherein general knowledge of the domain guides the detection. The second is a statistical method that learns the graphical appearance of the symbols and easily adapts to the big variability of the problem. The third method is a combination of the previous two methods that inherits their respective strengths, i.e. copes the big variability and does not need annotated data. Secondly, we present two relational strategies that tackle the problem of the visual context extraction. The fi rst one is a full bottom up method that heuristically searches in a graph representation the contextual relations between symbols. Contrarily, the second is syntactic method that models probabilistically the structure of the documents. It automatically learns the model, which guides the inference algorithm to encounter the best structural representation for a given input. Finally, we construct a knowledge-based model consisting of an ontological de nition of the domain and real data. This model permits to perform contextual reasoning and to detect semantic inconsistencies within the data. We evaluate the suitability of the proposed contributions in the framework of floor plan interpretation. Since there is no standard in the modeling of these documents there exists an enormous notation variability from plan to plan in terms of vocabulary and syntax. Therefore, floor plan interpretation is a relevant task in the graphical document understanding problem. It is also worth to mention that we make freely available all the resources used in this thesis {the data, the tool used to generate the data, and the evaluation scripts{ with the aim of fostering research in the graphical document understanding task.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Gemma Sanchez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-8-8 Medium  
  Area Expedition Conference  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ Her2014 Serial 2574  
Permanent link to this record
 

 
Author Carles Sanchez edit  isbn
openurl 
  Title Tracheal Structure Characterization using Geometric and Appearance Models for Efficient Assessment of Stenosis in Videobronchoscopy Type Book Whole
  Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Recent advances in endoscopic devices have increased their use for minimal invasive diagnostic and intervention procedures. Among all endoscopic modalities, bronchoscopy is one of the most frequent with around 261 millions of procedures per year. Although the use of bronchoscopy is spread among clinical facilities it presents some drawbacks, being the visual inspection for the assessment of anatomical measurements the most prevalent of them. In
particular, inaccuracies in the estimation of the degree of stenosis (the percentage of obstructed airway) decreases its diagnostic yield and might lead to erroneous treatments. An objective computation of tracheal stenosis in bronchoscopy videos would constitute a breakthrough for this non-invasive technique and a reduction in treatment cost.
This thesis settles the first steps towards on-line reliable extraction of anatomical information from videobronchoscopy for computation of objective measures. In particular, we focus on the computation of the degree of stenosis, which is obtained by comparing the area delimited by a healthy tracheal ring and the stenosed lumen. Reliable extraction of airway structures in interventional videobronchoscopy is a challenging task. This is mainly due to the large variety of acquisition conditions (positions and illumination), devices (different digitalizations) and in videos acquired at the operating room the unpredicted presence of surgical devices (such as probe ends). This thesis contributes to on-line stenosis assessment in several ways. We
propose a parametric strategy for the extraction of lumen and tracheal rings regions based on the characterization of their geometry and appearance that guide a deformable model. The geometric and appearance characterization is based on a physical model describing the way bronchoscopy images are obtained and includes local and global descriptions. In order to ensure a systematic applicability we present a statistical framework to select the optimal
parameters of our method. Experiments perform on the first public annotated database, show that the performance of our method is comparable to the one provided by clinicians and its computation time allows for a on-line implementation in the operating room.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor F. Javier Sanchez;Debora Gil;Jorge Bernal  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-9-5 Medium  
  Area Expedition Conference  
  Notes IAM; 600.075 Approved no  
  Call Number Admin @ si @ San2014 Serial 2575  
Permanent link to this record
 

 
Author Antonio Hernandez edit  isbn
openurl 
  Title From pixels to gestures: learning visual representations for human analysis in color and depth data sequences Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The visual analysis of humans from images is an important topic of interest due to its relevance to many computer vision applications like pedestrian detection, monitoring and surveillance, human-computer interaction, e-health or content-based image retrieval, among others.

In this dissertation we are interested in learning different visual representations of the human body that are helpful for the visual analysis of humans in images and video sequences. To that end, we analyze both RGB and depth image modalities and address the problem from three different research lines, at different levels of abstraction; from pixels to gestures: human segmentation, human pose estimation and gesture recognition.

First, we show how binary segmentation (object vs. background) of the human body in image sequences is helpful to remove all the background clutter present in the scene. The presented method, based on Graph cuts optimization, enforces spatio-temporal consistency of the produced segmentation masks among consecutive frames. Secondly, we present a framework for multi-label segmentation for obtaining much more detailed segmentation masks: instead of just obtaining a binary representation separating the human body from the background, finer segmentation masks can be obtained separating the different body parts.

At a higher level of abstraction, we aim for a simpler yet descriptive representation of the human body. Human pose estimation methods usually rely on skeletal models of the human body, formed by segments (or rectangles) that represent the body limbs, appropriately connected following the kinematic constraints of the human body. In practice, such skeletal models must fulfill some constraints in order to allow for efficient inference, while actually limiting the expressiveness of the model. In order to cope with this, we introduce a top-down approach for predicting the position of the body parts in the model, using a mid-level part representation based on Poselets.

Finally, we propose a framework for gesture recognition based on the bag of visual words framework. We leverage the benefits of RGB and depth image modalities by combining modality-specific visual vocabularies in a late fusion fashion. A new rotation-variant depth descriptor is presented, yielding better results than other state-of-the-art descriptors. Moreover, spatio-temporal pyramids are used to encode rough spatial and temporal structure. In addition, we present a probabilistic reformulation of Dynamic Time Warping for gesture segmentation in video sequences. A Gaussian-based probabilistic model of a gesture is learnt, implicitly encoding possible deformations in both spatial and time domains.
 
  Address January 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Sergio Escalera;Stan Sclaroff  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-940902-0-2 Medium  
  Area Expedition Conference  
  Notes HuPBA;MILAB Approved no  
  Call Number Admin @ si @ Her2015 Serial 2576  
Permanent link to this record
 

 
Author Hongxing Gao edit  isbn
openurl 
  Title Focused Structural Document Image Retrieval in Digital Mailroom Applications Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In this work, we develop a generic framework that is able to handle the document retrieval problem in various scenarios such as searching for full page matches or retrieving the counterparts for specific document areas, focusing on their structural similarity or letting their visual resemblance to play a dominant role. Based on the spatial indexing technique, we propose to search for matches of local key-region pairs carrying both structural and visual information from the collection while a scheme allowing to adjust the relative contribution of structural and visual similarity is presented.
Based on the fact that the structure of documents is tightly linked with the distance among their elements, we firstly introduce an efficient detector named Distance Transform based Maximally Stable Extremal Regions (DTMSER). We illustrate that this detector is able to efficiently extract the structure of a document image as a dendrogram (hierarchical tree) of multi-scale key-regions that roughly correspond to letters, words and paragraphs. We demonstrate that, without benefiting from the structure information, the key-regions extracted by the DTMSER algorithm achieve better results comparing with state-of-the-art methods while much less amount of key-regions are employed.
We subsequently propose a pair-wise Bag of Words (BoW) framework to efficiently embed the explicit structure extracted by the DTMSER algorithm. We represent each document as a list of key-region pairs that correspond to the edges in the dendrogram where inclusion relationship is encoded. By employing those structural key-region pairs as the pooling elements for generating the histogram of features, the proposed method is able to encode the explicit inclusion relations into a BoW representation. The experimental results illustrate that the pair-wise BoW, powered by the embedded structural information, achieves remarkable improvement over the conventional BoW and spatial pyramidal BoW methods.
To handle various retrieval scenarios in one framework, we propose to directly query a series of key-region pairs, carrying both structure and visual information, from the collection. We introduce the spatial indexing techniques to the document retrieval community to speed up the structural relationship computation for key-region pairs. We firstly test the proposed framework in a full page retrieval scenario where structurally similar matches are expected. In this case, the pair-wise querying method achieves notable improvement over the BoW and spatial pyramidal BoW frameworks. Furthermore, we illustrate that the proposed method is also able to handle focused retrieval situations where the queries are defined as a specific interesting partial areas of the images. We examine our method on two types of focused queries: structure-focused and exact queries. The experimental results show that, the proposed generic framework obtains nearly perfect precision on both types of focused queries while it is the first framework able to tackle structure-focused queries, setting a new state of the art in the field.
Besides, we introduce a line verification method to check the spatial consistency among the matched key-region pairs. We propose a computationally efficient version of line verification through a two step implementation. We first compute tentative localizations of the query and subsequently employ them to divide the matched key-region pairs into several groups, then line verification is performed within each group while more precise bounding boxes are computed. We demonstrate that, comparing with the standard approach (based on RANSAC), the line verification proposed generally achieves much higher recall with slight loss on precision on specific queries.
 
  Address January 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Josep Llados;Dimosthenis Karatzas;Marçal Rusiñol  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-943427-0-7 Medium  
  Area Expedition Conference  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ Gao2015 Serial 2577  
Permanent link to this record
 

 
Author David Roche edit  openurl
  Title A Statistical Framework for Terminating Evolutionary Algorithms at their Steady State Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract As any iterative technique, it is a necessary condition a stop criterion for terminating Evolutionary Algorithms (EA). In the case of optimization methods, the algorithm should stop at the time it has reached a steady state so it can not improve results anymore. Assessing the reliability of termination conditions for EAs is of prime importance. A wrong or weak stop criterion can negatively a ect both the computational e ort and the nal result.
In this Thesis, we introduce a statistical framework for assessing whether a termination condition is able to stop EA at its steady state. In one hand a numeric approximation to steady states to detect the point in which EA population has lost its diversity has been presented for EA termination. This approximation has been applied to di erent EA paradigms based on diversity and a selection of functions covering the properties most relevant for EA convergence. Experiments show that our condition works regardless of the search space dimension and function landscape and Di erential Evolution (DE) arises as the best paradigm. On the other hand, we use a regression model in order to determine the requirements ensuring that a measure derived from EA evolving population is related to the distance to the optimum in xspace.
Our theoretical framework is analyzed across several benchmark test functions
and two standard termination criteria based on function improvement in f-space and EA population x-space distribution for the DE paradigm. Results validate our statistical framework as a powerful tool for determining the capability of a measure for terminating EA and select the x-space distribution as the best-suited for accurately stopping DE in real-world applications.
 
  Address July 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Debora Gil;Jesus Giraldo  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes IAM; 600.075 Approved no  
  Call Number Admin @ si @ Roc2015 Serial 2686  
Permanent link to this record
 

 
Author Patricia Marquez edit  isbn
openurl 
  Title A Confidence Framework for the Assessment of Optical Flow Performance Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Optical Flow (OF) is the input of a wide range of decision support systems such as car driver assistance, UAV guiding or medical diagnose. In these real situations, the absence of ground truth forces to assess OF quality using quantities computed from either sequences or the computed optical flow itself. These quantities are generally known as Confidence Measures, CM. Even if we have a proper confidence measure we still need a way to evaluate its ability to discard pixels with an OF prone to have a large error. Current approaches only provide a descriptive evaluation of the CM performance but such approaches are not capable to fairly compare different confidence measures and optical flow algorithms. Thus, it is of prime importance to define a framework and a general road map for the evaluation of optical flow performance.

This thesis provides a framework able to decide which pairs “ optical flow – confidence measure” (OF-CM) are best suited for optical flow error bounding given a confidence level determined by a decision support system. To design this framework we cover the following points:

Descriptive scores. As a first step, we summarize and analyze the sources of inaccuracies in the output of optical flow algorithms. Second, we present several descriptive plots that visually assess CM capabilities for OF error bounding. In addition to the descriptive plots, given a plot representing OF-CM capabilities to bound the error, we provide a numeric score that categorizes the plot according to its decreasing profile, that is, a score assessing CM performance.
Statistical framework. We provide a comparison framework that assesses the best suited OF-CM pair for error bounding that uses a two stage cascade process. First of all we assess the predictive value of the confidence measures by means of a descriptive plot. Then, for a sample of descriptive plots computed over training frames, we obtain a generic curve that will be used for sequences with no ground truth. As a second step, we evaluate the obtained general curve and its capabilities to really reflect the predictive value of a confidence measure using the variability across train frames by means of ANOVA.

The presented framework has shown its potential in the application on clinical decision support systems. In particular, we have analyzed the impact of the different image artifacts such as noise and decay to the output of optical flow in a cardiac diagnose system and we have improved the navigation inside the bronchial tree on bronchoscopy.
 
  Address July 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Debora Gil;Aura Hernandez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-943427-2-1 Medium  
  Area Expedition Conference  
  Notes IAM; 600.075 Approved no  
  Call Number Admin @ si @ Mar2015 Serial 2687  
Permanent link to this record
 

 
Author Marc Serra edit  isbn
openurl 
  Title Modeling, estimation and evaluation of intrinsic images considering color information Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Image values are the result of a combination of visual information coming from multiple sources. Recovering information from the multiple factors thatproduced an image seems a hard and ill-posed problem. However, it is important to observe that humans develop the ability to interpret images and recognize and isolate specific physical properties of the scene.

Images describing a single physical characteristic of an scene are called intrinsic images. These images would benefit most computer vision tasks which are often affected by the multiple complex effects that are usually found in natural images (e.g. cast shadows, specularities, interreflections...).

In this thesis we analyze the problem of intrinsic image estimation from different perspectives, including the theoretical formulation of the problem, the visual cues that can be used to estimate the intrinsic components and the evaluation mechanisms of the problem.
 
  Address September 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Robert Benavente;Olivier Penacchio  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-943427-4-5 Medium  
  Area Expedition Conference  
  Notes CIC; 600.074 Approved no  
  Call Number Admin @ si @ Ser2015 Serial 2688  
Permanent link to this record
 

 
Author Alejandro Gonzalez Alzate edit  isbn
openurl 
  Title Multi-modal Pedestrian Detection Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Pedestrian detection continues to be an extremely challenging problem in real scenarios, in which situations like illumination changes, noisy images, unexpected objects, uncontrolled scenarios and variant appearance of objects occur constantly. All these problems force the development of more robust detectors for relevant applications like vision-based autonomous vehicles, intelligent surveillance, and pedestrian tracking for behavior analysis. Most reliable vision-based pedestrian detectors base their decision on features extracted using a single sensor capturing complementary features, e.g., appearance, and texture. These features usually are extracted from the current frame, ignoring temporal information, or including it in a post process step e.g., tracking or temporal coherence. Taking into account these issues we formulate the following question: can we generate more robust pedestrian detectors by introducing new information sources in the feature extraction step?
In order to answer this question we develop different approaches for introducing new information sources to well-known pedestrian detectors. We start by the inclusion of temporal information following the Stacked Sequential Learning (SSL) paradigm which suggests that information extracted from the neighboring samples in a sequence can improve the accuracy of a base classifier.
We then focus on the inclusion of complementary information from different sensors like 3D point clouds (LIDAR – depth), far infrared images (FIR), or disparity maps (stereo pair cameras). For this end we develop a multi-modal framework in which information from different sensors is used for increasing detection accuracy (by increasing information redundancy). Finally we propose a multi-view pedestrian detector, this multi-view approach splits the detection problem in n sub-problems.
Each sub-problem will detect objects in a given specific view reducing in that way the variability problem faced when a single detectors is used for the whole problem. We show that these approaches obtain competitive results with other state-of-the-art methods but instead of design new features, we reuse existing ones boosting their performance.
 
  Address November 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor David Vazquez;Antonio Lopez;  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-943427-7-6 Medium  
  Area Expedition Conference  
  Notes ADAS; 600.076 Approved no  
  Call Number Admin @ si @ Gon2015 Serial 2706  
Permanent link to this record
 

 
Author Adriana Romero edit  openurl
  Title Assisting the training of deep neural networks with applications to computer vision Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Deep learning has recently been enjoying an increasing popularity due to its success in solving challenging tasks. In particular, deep learning has proven to be effective in a large variety of computer vision tasks, such as image classification, object recognition and image parsing. Contrary to previous research, which required engineered feature representations, designed by experts, in order to succeed, deep learning attempts to learn representation hierarchies automatically from data. More recently, the trend has been to go deeper with representation hierarchies.
Learning (very) deep representation hierarchies is a challenging task, which
involves the optimization of highly non-convex functions. Therefore, the search
for algorithms to ease the learning of (very) deep representation hierarchies from data is extensive and ongoing.
In this thesis, we tackle the challenging problem of easing the learning of (very) deep representation hierarchies. We present a hyper-parameter free, off-the-shelf, simple and fast unsupervised algorithm to discover hidden structure from the input data by enforcing a very strong form of sparsity. We study the applicability and potential of the algorithm to learn representations of varying depth in a handful of applications and domains, highlighting the ability of the algorithm to provide discriminative feature representations that are able to achieve top performance.
Yet, while emphasizing the great value of unsupervised learning methods when
labeled data is scarce, the recent industrial success of deep learning has revolved around supervised learning. Supervised learning is currently the focus of many recent research advances, which have shown to excel at many computer vision tasks. Top performing systems often involve very large and deep models, which are not well suited for applications with time or memory limitations. More in line with the current trends, we engage in making top performing models more efficient, by designing very deep and thin models. Since training such very deep models still appears to be a challenging task, we introduce a novel algorithm that guides the training of very thin and deep models by hinting their intermediate representations.
Very deep and thin models trained by the proposed algorithm end up extracting feature representations that are comparable or even better performing
than the ones extracted by large state-of-the-art models, while compellingly
reducing the time and memory consumption of the model.
 
  Address October 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Carlo Gatta;Petia Radeva  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number Admin @ si @ Rom2015 Serial 2707  
Permanent link to this record
 

 
Author Sergio Vera edit  isbn
openurl 
  Title Anatomic Registration based on Medial Axis Parametrizations Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Image registration has been for many years the gold standard method to bring two images into correspondence. It has been used extensively in the eld of medical imaging in order to put images of di erent patients into a common overlapping spatial position. However, medical image registration is a slow, iterative optimization process, where many variables and prone to fall into the pit traps local minima.
A coordinate system parameterizing the interior of organs is a powerful tool for a systematic localization of injured tissue. If the same coordinate values are assigned to speci c anatomical sites, parameterizations ensure integration of data across different medical image modalities. Harmonic mappings have been used to produce parametric meshes over the surface of anatomical shapes, given their ability to set values at speci c locations through boundary conditions. However, most of the existing implementations in medical imaging restrict to either anatomical surfaces, or the depth coordinate with boundary conditions is given at discrete sites of limited geometric diversity.
The medial surface of the shape can be used to provide a continuous basis for the de nition of a depth coordinate. However, given that di erent methods for generation of medial surfaces generate di erent manifolds, not all of them are equally suited to be the basis of radial coordinate for a parameterization. It would be desirable that the medial surface will be smooth, and robust to surface shape noise, with low number of spurious branches or surfaces.
In this thesis we present methods for computation of smooth medial manifolds and apply them to the generation of for anatomical volumetric parameterization that extends current harmonic parameterizations to the interior anatomy using information provided by the volume medial surface. This reference system sets a solid base for creating anatomical models of the anatomical shapes, and allows comparing several patients in a common framework of reference.
 
  Address November 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Debora Gil;Miguel Angel Gonzalez Ballester  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-943427-8-3 Medium  
  Area Expedition Conference  
  Notes IAM; 600.075 Approved no  
  Call Number Admin @ si @ Ver2015 Serial 2708  
Permanent link to this record
 

 
Author Joan M. Nuñez edit  isbn
openurl 
  Title Vascular Pattern Characterization in Colonoscopy Images Type Book Whole
  Year 2015 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Colorectal cancer is the third most common cancer worldwide and the second most common malignant tumor in Europe. Screening tests have shown to be very e ective in increasing the survival rates since they allow an early detection of polyps. Among the di erent screening techniques, colonoscopy is considered the gold standard although clinical studies mention several problems that have an impact in the quality of the procedure. The navigation through the rectum and colon track can be challenging for the physicians which can increase polyp miss rates. The thorough visualization of the colon track must be ensured so that
the chances of missing lesions are minimized. The visual analysis of colonoscopy images can provide important information to the physicians and support their navigation during the procedure.
Blood vessels and their branching patterns can provide descriptive power to potentially develop biometric markers. Anatomical markers based on blood vessel patterns could be used to identify a particular scene in colonoscopy videos and to support endoscope navigation by generating a sequence of ordered scenes through the di erent colon sections. By verifying the presence of vascular content in the endoluminal scene it is also possible to certify a proper
inspection of the colon mucosa and to improve polyp localization. Considering the potential uses of blood vessel description, this contribution studies the characterization of the vascular content and the analysis of the descriptive power of its branching patterns.
Blood vessel characterization in colonoscopy images is shown to be a challenging task. The endoluminal scene is conformed by several elements whose similar characteristics hinder the development of particular models for each of them. To overcome such diculties we propose the use of the blood vessel branching characteristics as key features for pattern description. We present a model to characterize junctions in binary patterns. The implementation
of the junction model allows us to develop a junction localization method. We
created two data sets including manually labeled vessel information as well as manual ground truths of two types of keypoint landmarks: junctions and endpoints. The proposed method outperforms the available algorithms in the literature in experiments in both, our newly created colon vessel data set, and in DRIVE retinal fundus image data set. In the latter case, we created a manual ground truth of junction coordinates. Since we want to explore the descriptive potential of junctions and vessels, we propose a graph-based approach to
create anatomical markers. In the context of polyp localization, we present a new method to inhibit the in uence of blood vessels in the extraction valley-pro le information. The results show that our methodology decreases vessel in
uence, increases polyp information and leads to an improvement in state-of-the-art polyp localization performance. We also propose a polyp-speci c segmentation method that outperforms other general and speci c approaches.
 
  Address November 2015  
  Corporate Author Thesis Ph.D. thesis  
  Publisher (down) Ediciones Graficas Rey Place of Publication Editor Fernando Vilariño  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-943427-6-9 Medium  
  Area Expedition Conference  
  Notes MV Approved no  
  Call Number Admin @ si @ Nuñ2015 Serial 2709  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: