|   | 
Details
   web
Records
Author Marçal Rusiñol; Josep Llados
Title (down) Symbol Spotting in Digital Libraries:Focused Retrieval over Graphic-rich Document Collections Type Book Whole
Year 2010 Publication Symbol Spotting in Digital Libraries:Focused Retrieval over Graphic-rich Document Collections Abbreviated Journal
Volume Issue Pages
Keywords Focused Retrieval , Graphical Pattern Indexation,Graphics Recognition ,Pattern Recognition , Performance Evaluation , Symbol Description ,Symbol Spotting
Abstract The specific problem of symbol recognition in graphical documents requires additional techniques to those developed for character recognition. The most well-known obstacle is the so-called Sayre paradox: Correct recognition requires good segmentation, yet improvement in segmentation is achieved using information provided by the recognition process. This dilemma can be avoided by techniques that identify sets of regions containing useful information. Such symbol-spotting methods allow the detection of symbols in maps or technical drawings without having to fully segment or fully recognize the entire content.

This unique text/reference provides a complete, integrated and large-scale solution to the challenge of designing a robust symbol-spotting method for collections of graphic-rich documents. The book examines a number of features and descriptors, from basic photometric descriptors commonly used in computer vision techniques to those specific to graphical shapes, presenting a methodology which can be used in a wide variety of applications. Additionally, readers are supplied with an insight into the problem of performance evaluation of spotting methods. Some very basic knowledge of pattern recognition, document image analysis and graphics recognition is assumed.
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-1-84996-208-7 Medium
Area Expedition Conference
Notes DAG Approved no
Call Number DAG @ dag @ RuL2010a Serial 1292
Permanent link to this record
 

 
Author Jaume Garcia
Title (down) Statistical Models of the Architecture and Function of the Left Ventricle Type Book Whole
Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Cardiovascular Diseases, specially those affecting the Left Ventricle (LV), are the leading cause of death in developed countries with approximately a 30% of all global deaths. In order to address this public health concern, physicians focus on diagnosis and therapy planning. On one hand, early and accurate detection of Regional Wall Motion Abnormalities (RWMA) significantly contributes to a quick diagnosis and prevents the patient to reach more severe stages. On the other hand, a thouroughly knowledge of the normal gross anatomy of the LV, as well as, the distribution of its muscular fibers is crucial for designing specific interventions and therapies (such as pacemaker implanction). Statistical models obtained from the analysis of different imaging modalities allow the computation of the normal ranges of variation within a given population. Normality models are a valuable tool for the definition of objective criterions quantifying the degree of (anomalous) deviation of the LV function and anatomy for a given subject. The creation of statistical models involve addressing three main issues: extraction of data from images, definition of a common domain for comparison of data across patients and designing appropriate statistical analysis schemes. In this PhD thesis we present generic image processing tools for the creation of statistical models of the LV anatomy and function. On one hand, we use differential geometry concepts to define a computational framework (the Normalized Parametric Domain, NPD) suitable for the comparison and fusion of several clinical scores obtained over the LV. On the other hand, we present a variational approach (the Harmonic Phase Flow, HPF) for the estimation of myocardial motion that provides dense and continuous vector fields without overestimating motion at injured areas. These tools are used for the creation of statistical models. Regarding anatomy, we obtain an atlas jointly modelling, both, LV gross anatomy and fiber architecture. Regarding function, we compute normality patterns of scores characterizing the (global and local) LV function and explore, for the first time, the configuration of local scores better suited for RWMA detection.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Debora Gil
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes IAM Approved no
Call Number IAM @ iam @ Gar2009a Serial 1499
Permanent link to this record
 

 
Author David Guillamet
Title (down) Statistical Local Appearance Models for Object Recognition Type Book Whole
Year 2004 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Bellaterra
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Jordi Vitria
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ Gui2004 Serial 444
Permanent link to this record
 

 
Author Jose Antonio Rodriguez
Title (down) Statistical frameworks and prior information modeling in handwritten word-spotting Type Book Whole
Year 2009 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Handwritten word-spotting (HWS) is the pattern analysis task that consists in finding keywords in handwritten document images. So far, HWS has been applied mostly to historical documents in order to build search engines for such image collections. This thesis addresses the problem of word-spotting for detecting important keywords in business documents. This is a first step towards the process of automatic routing of correspondence based on content.

However, the application of traditional HWS techniques fails for this type of documents. As opposed to historical documents, real business documents present a very high variability in terms of writing styles, spontaneous writing, crossed-out words, spelling mistakes, etc. The main goal of this thesis is the development of pattern recognition techniques that lead to a high-performance HWS system for this challenging type of data.

We develop a statistical framework in which word models are expressed in terms of hidden Markov models and the a priori information is encoded in a universal vocabulary of Gaussian codewords. This systems leads to a very robust performance in word-spotting task. We also find that by constraining the word models to the universal vocabulary, the a priori information of the problem of interest can be exploited for developing new contributions. These include a novel writer adaptation method, a system for searching handwritten words by generating typed text images, and a novel model-based similarity between feature vector sequences.
Address Barcelona (Spain)
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Gemma Sanchez;Josep Llados;Florent Perronnin
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ Rod2009 Serial 1266
Permanent link to this record
 

 
Author Xavier Soria
Title (down) Single sensor multi-spectral imaging Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract The image sensor, nowadays, is rolling the smartphone industry. While some phone brands explore equipping more image sensors, others, like Google, maintain their smartphones with just one sensor; but this sensor is equipped with Deep Learning to enhance the image quality. However, what all brands agree on is the need to research new image sensors; for instance, in 2015 Omnivision and PixelTeq presented new CMOS based image sensors defined as multispectral Single Sensor Camera (SSC), which are capable of capturing multispectral bands. This dissertation presents the benefits of using a multispectral SSCs that, as aforementioned, simultaneously acquires images in the visible and near-infrared (NIR) bands. The principal benefits while addressing problems related to image bands in the spectral range of 400 to 1100 nanometers, there are cost reductions in the hardware and software setup because only one SSC is needed instead of two, and the images alignment are not required any more. Concerning to the NIR spectrum, many works in literature have proven the benefits of working with NIR to enhance RGB images (e.g., image enhancement, remove shadows, dehazing, etc.). In spite of the advantage of using SSC (e.g., low latency), there are some drawback to be solved. One of this drawback corresponds to the nature of the silicon-based sensor, which in addition to capture the RGB image, when the infrared cut off filter is not installed it also acquires NIR information into the visible image. This phenomenon is called RGB and NIR crosstalking. This thesis firstly faces this problem in challenging images and then it shows the benefit of using multispectral images in the edge detection task.
The RGB color restoration from RGBN image is the topic tackled in RGB and NIR crosstalking. Even though in the literature a set of processes have been proposed to face this issue, in this thesis novel approaches, based on DL, are proposed to subtract the additional NIR included in the RGB channel. More precisely, an Artificial Neural Network (NN) and two Convolutional Neural Network (CNN) models are proposed. As the DL based models need a dataset with a large collection of image pairs, a large dataset is collected to address the color restoration. The collected images are from challenging scenes where the sunlight radiation is sufficient to give absorption/reflectance properties to the considered scenes. An extensive evaluation has been conducted on the CNN models, differences from most of the restored images are almost imperceptible to the human eye. The next proposal of the thesis is the validation of the usage of SSC images in the edge detection task. Three methods based on CNN have been proposed. While the first one is based on the most used model, holistically-nested edge detection (HED) termed as multispectral HED (MS-HED), the other two have been proposed observing the drawbacks of MS-HED. These two novel architectures have been designed from scratch (training from scratch); after the first architecture is validated in the visible domain a slight redesign is proposed to tackle the multispectral domain. Again, another dataset is collected to face this problem with SSCs. Even though edge detection is confronted in the multispectral domain, its qualitative and quantitative evaluation demonstrates the generalization in other datasets used for edge detection, improving state-of-the-art results.
Address September 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Angel Sappa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-9-7 Medium
Area Expedition Conference
Notes MSIAU; 600.122 Approved no
Call Number Admin @ si @ Sor2019 Serial 3391
Permanent link to this record
 

 
Author Mohammad Rouhani
Title (down) Shape Representation and Registration using Implicit Functions Type Book Whole
Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Shape representation and registration are two important problems in computer vision and graphics. Representing the given cloud of points through an implicit function provides a higher level information describing the data. This representation can be more compact more robust to noise and outliers, hence it can be exploited in different computer vision application. In the first part of this thesis implicit shape representations, including both implicit B-spline and polynomial, are tackled. First, an approximation of a geometric distance is proposed to measure the closeness of the given cloud of points and the implicit surface. The analysis of the proposed distance shows an accurate estimation with smooth behavior. The distance by itself is used in a RANSAC based quadratic fitting method. Moreover, since the gradient information of the distance with respect to the surface parameters can be analytically computed, it is used in Levenberg-Marquadt algorithm to refine the surface parameters. In a different approach, an algebraic fitting method is used to represent an object through implicit B-splines. The outcome is a smooth flexible surface and can be represented in different levels from coarse to fine. This property has been exploited to solve the registration problem in the second part of the thesis. In the proposed registration technique the model set is replaced with an implicit representation provided in the first part; then, the point-to-point registration is converted to a point-to-model one in a higher level. This registration error can benefit from different distance estimations to speed up the registration process even without need of correspondence search. Finally, the non-rigid registration problem is tackled through a quadratic distance approximation that is based on the curvature information of the model set. This approximation is used in a free form deformation model to update its control lattice. Then it is shown how an accurate distance approximation can benefit non-rigid registration problems.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Angel Sappa
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number Admin @ si @ Rou2012 Serial 2205
Permanent link to this record
 

 
Author Josep Llados; W. Liu; Jean-Marc Ogier
Title (down) Seventh IAPR International Workshop on Graphics Recognition GREC 2007 Type Book Whole
Year 2007 Publication Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Curitiba (Brazil)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number DAG @ dag @ LLO2007 Serial 835
Permanent link to this record
 

 
Author Michal Drozdzal
Title (down) Sequential image analysis for computer-aided wireless endoscopy Type Book Whole
Year 2014 Publication PhD Thesis, Universitat de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Wireless Capsule Endoscopy (WCE) is a technique for inner-visualization of the entire small intestine and, thus, offers an interesting perspective on intestinal motility. The two major drawbacks of this technique are: 1) huge amount of data acquired by WCE makes the motility analysis tedious and 2) since the capsule is the first tool that offers complete inner-visualization of the small intestine,the exact importance of the observed events is still an open issue. Therefore, in this thesis, a novel computer-aided system for intestinal motility analysis is presented. The goal of the system is to provide an easily-comprehensible visual description of motility-related intestinal events to a physician. In order to do so, several tools based either on computer vision concepts or on machine learning techniques are presented. A method for transforming 3D video signal to a holistic image of intestinal motility, called motility bar, is proposed. The method calculates the optimal mapping from video into image from the intestinal motility point of view.
To characterize intestinal motility, methods for automatic extraction of motility information from WCE are presented. Two of them are based on the motility bar and two of them are based on frame-per-frame analysis. In particular, four algorithms dealing with the problems of intestinal contraction detection, lumen size estimation, intestinal content characterization and wrinkle frame detection are proposed and validated. The results of the algorithms are converted into sequential features using an online statistical test. This test is designed to work with multivariate data streams. To this end, we propose a novel formulation of concentration inequality that is introduced into a robust adaptive windowing algorithm for multivariate data streams. The algorithm is used to obtain robust representation of segments with constant intestinal motility activity. The obtained sequential features are shown to be discriminative in the problem of abnormal motility characterization.
Finally, we tackle the problem of efficient labeling. To this end, we incorporate active learning concepts to the problems present in WCE data and propose two approaches. The first one is based the concepts of sequential learning and the second one adapts the partition-based active learning to an error-free labeling scheme. All these steps are sufficient to provide an extensive visual description of intestinal motility that can be used by an expert as decision support system.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Petia Radeva
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-3-3 Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @ Dro2014 Serial 2486
Permanent link to this record
 

 
Author Lu Yu
Title (down) Semantic Representation: From Color to Deep Embeddings Type Book Whole
Year 2019 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings.
In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists.
In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
Address November 2019
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Joost Van de Weijer;Yongmei Cheng
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-121011-3-3 Medium
Area Expedition Conference
Notes LAMP; 600.120 Approved no
Call Number Admin @ si @ Yu2019 Serial 3394
Permanent link to this record
 

 
Author Aitor Alvarez-Gila
Title (down) Self-supervised learning for image-to-image translation in the small data regime Type Book Whole
Year 2022 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords Computer vision; Neural networks; Self-supervised learning; Image-to-image mapping; Probabilistic programming
Abstract The mass irruption of Deep Convolutional Neural Networks (CNNs) in computer vision since 2012 led to a dominance of the image understanding paradigm consisting in an end-to-end fully supervised learning workflow over large-scale annotated datasets. This approach proved to be extremely useful at solving a myriad of classic and new computer vision tasks with unprecedented performance —often, surpassing that of humans—, at the expense of vast amounts of human-labeled data, extensive computational resources and the disposal of all of our prior knowledge on the task at hand. Even though simple transfer learning methods, such as fine-tuning, have achieved remarkable impact, their success when the amount of labeled data in the target domain is small is limited. Furthermore, the non-static nature of data generation sources will often derive in data distribution shifts that degrade the performance of deployed models. As a consequence, there is a growing demand for methods that can exploit elements of prior knowledge and sources of information other than the manually generated ground truth annotations of the images during the network training process, so that they can adapt to new domains that constitute, if not a small data regime, at least a small labeled data regime. This thesis targets such few or no labeled data scenario in three distinct image-to-image mapping learning problems. It contributes with various approaches that leverage our previous knowledge of different elements of the image formation process: We first present a data-efficient framework for both defocus and motion blur detection, based on a model able to produce realistic synthetic local degradations. The framework comprises a self-supervised, a weakly-supervised and a semi-supervised instantiation, depending on the absence or availability and the nature of human annotations, and outperforms fully-supervised counterparts in a variety of settings. Our knowledge on color image formation is then used to gather input and target ground truth image pairs for the RGB to hyperspectral image reconstruction task. We make use of a CNN to tackle this problem, which, for the first time, allows us to exploit spatial context and achieve state-of-the-art results given a limited hyperspectral image set. In our last contribution to the subfield of data-efficient image-to-image transformation problems, we present the novel semi-supervised task of zero-pair cross-view semantic segmentation: we consider the case of relocation of the camera in an end-to-end trained and deployed monocular, fixed-view semantic segmentation system often found in industry. Under the assumption that we are allowed to obtain an additional set of synchronized but unlabeled image pairs of new scenes from both original and new camera poses, we present ZPCVNet, a model and training procedure that enables the production of dense semantic predictions in either source or target views at inference time. The lack of existing suitable public datasets to develop this approach led us to the creation of MVMO, a large-scale Multi-View, Multi-Object path-traced dataset with per-view semantic segmentation annotations. We expect MVMO to propel future research in the exciting under-developed fields of cross-view and multi-view semantic segmentation. Last, in a piece of applied research of direct application in the context of process monitoring of an Electric Arc Furnace (EAF) in a steelmaking plant, we also consider the problem of simultaneously estimating the temperature and spectral emissivity of distant hot emissive samples. To that end, we design our own capturing device, which integrates three point spectrometers covering a wide range of the Ultra-Violet, visible, and Infra-Red spectra and is capable of registering the radiance signal incoming from an 8cm diameter spot located up to 20m away. We then define a physically accurate radiative transfer model that comprises the effects of atmospheric absorbance, of the optical system transfer function, and of the sample temperature and spectral emissivity themselves. We solve this inverse problem without the need for annotated data using a probabilistic programming-based Bayesian approach, which yields full posterior distribution estimates of the involved variables that are consistent with laboratory-grade measurements.
Address Julu, 2019
Corporate Author Thesis Ph.D. thesis
Publisher Place of Publication Editor Joost Van de Weijer; Estibaliz Garrote
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP Approved no
Call Number Admin @ si @ Alv2022 Serial 3716
Permanent link to this record
 

 
Author Mario Hernandez; Joao Sanchez; Jordi Vitria
Title (down) Selected papers from Iberian Conference on Pattern Recognition and Image Analysis Type Book Whole
Year 2012 Publication Pattern Recognition Abbreviated Journal
Volume 45 Issue 9 Pages 3047-3582
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes OR;MV Approved no
Call Number Admin @ si @ HSV2012 Serial 2069
Permanent link to this record
 

 
Author Felipe Lumbreras
Title (down) Segmentation, classification and modelization of textures by means of multiresolution decomposition techniques. Type Book Whole
Year 2001 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS Approved no
Call Number ADAS @ adas @ Lum2001 Serial 188
Permanent link to this record
 

 
Author Lei Kang
Title (down) Robust Handwritten Text Recognition in Scarce Labeling Scenarios: Disentanglement, Adaptation and Generation Type Book Whole
Year 2020 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Handwritten documents are not only preserved in historical archives but also widely used in administrative documents such as cheques and claims. With the rise of the deep learning era, many state-of-the-art approaches have achieved good performance on specific datasets for Handwritten Text Recognition (HTR). However, it is still challenging to solve real use cases because of the varied handwriting styles across different writers and the limited labeled data. Thus, both explorin a more robust handwriting recognition architectures and proposing methods to diminish the gap between the source and target data in an unsupervised way are
demanded.
In this thesis, firstly, we explore novel architectures for HTR, from Sequence-to-Sequence (Seq2Seq) method with attention mechanism to non-recurrent Transformer-based method. Secondly, we focus on diminishing the performance gap between source and target data in an unsupervised way. Finally, we propose a group of generative methods for handwritten text images, which could be utilized to increase the training set to obtain a more robust recognizer. In addition, by simply modifying the generative method and joining it with a recognizer, we end up with an effective disentanglement method to distill textual content from handwriting styles so as to achieve a generalized recognition performance.
We outperform state-of-the-art HTR performances in the experimental results among different scientific and industrial datasets, which prove the effectiveness of the proposed methods. To the best of our knowledge, the non-recurrent recognizer and the disentanglement method are the first contributions in the handwriting recognition field. Furthermore, we have outlined the potential research lines, which would be interesting to explore in the future.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Alicia Fornes;Marçal Rusiñol;Mauricio Villegas
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-122714-0-9 Medium
Area Expedition Conference
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ Kan20 Serial 3482
Permanent link to this record
 

 
Author Susana Alvarez
Title (down) Revisión de la teoría de los Textons Enfoque computacional en color Type Book Whole
Year 2012 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract El color y la textura son dos estímulos visuales importantes para la interpretación de las imágenes. La definición de descriptores computacionales que combinan estas dos características es aún un problema abierto. La dificultad se deriva esencialmente de la propia naturaleza de ambas, mientras que la textura es una propiedad de una región, el color es una propiedad de un punto.

Hasta ahora se han utilizado tres los tipos de aproximaciones para la combinación, (a) se describe la textura directamente en cada uno de los canales color, (b) se describen textura y color por separado y se combinan al final, y (c) la combinación se realiza con técnicas de aprendizaje automático. Considerando que este problema se resuelve en el sistema visual humano en niveles muy tempranos, en esta tesis se propone estudiar el problema a partir de la implementación directa de una teoría perceptual, la teoría de los textons, y explorar así su extensión a color.

Puesto que la teoría de los textons se basa en la descripción de la textura a partir de las densidades de los atributos locales, esto se adapta perfectamente al marco de trabajo de los descriptores holísticos (bag-of-words). Se han estudiado diversos descriptores basados en diferentes espacios de textons, y diferentes representaciones de las imágenes. Asimismo se ha estudiado la viabilidad de estos descriptores en una representación conceptual de nivel intermedio.

Los descriptores propuestos han demostrado ser muy eficientes en aplicaciones de recuperación y clasificación de imágenes, presentando ventajas en la generación de vocabularios. Los vocabularios se obtienen cuantificando directamente espacios de baja dimensión y la perceptualidad de estos espacios permite asociar semántica de bajo nivel a las palabras visuales. El estudio de los resultados permite concluir que si bien la aproximación holística es muy eficiente, la introducción de co-ocurrencia espacial de las propiedades de forma y color de los blobs de la imagen es un elemento clave para su combinación, hecho que no contradice las evidencias en percepción
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Maria Vanrell;Xavier Otazu
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes CIC Approved no
Call Number Alv2012b Serial 2216
Permanent link to this record
 

 
Author Lluis Pere de las Heras
Title (down) Relational Models for Visual Understanding of Graphical Documents. Application to Architectural Drawings. Type Book Whole
Year 2014 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Graphical documents express complex concepts using a visual language. This language consists of a vocabulary (symbols) and a syntax (structural relations between symbols) that articulate a semantic meaning in a certain context. Therefore, the automatic interpretation by computers of these sort of documents entails three main steps: the detection of the symbols, the extraction of the structural relations between these symbols, and the modeling of the knowledge that permits the extraction of the semantics. Di erent domains in graphical documents include: architectural and engineering drawings, maps, owcharts, etc.
Graphics Recognition in particular and Document Image Analysis in general are
born from the industrial need of interpreting a massive amount of digitalized documents after the emergence of the scanner. Although many years have passed, the graphical document understanding problem still seems to be far from being solved. The main reason is that the vast majority of the systems in the literature focus on very speci c problems, where the domain of the document dictates the implementation of the interpretation. As a result, it is dicult to reuse these strategies on di erent data and on di erent contexts, hindering thus the natural progress in the eld.
In this thesis, we face the graphical document understanding problem by proposing several relational models at di erent levels that are designed from a generic perspective. Firstly, we introduce three di erent strategies for the detection of symbols. The fi rst method tackles the problem structurally, wherein general knowledge of the domain guides the detection. The second is a statistical method that learns the graphical appearance of the symbols and easily adapts to the big variability of the problem. The third method is a combination of the previous two methods that inherits their respective strengths, i.e. copes the big variability and does not need annotated data. Secondly, we present two relational strategies that tackle the problem of the visual context extraction. The fi rst one is a full bottom up method that heuristically searches in a graph representation the contextual relations between symbols. Contrarily, the second is syntactic method that models probabilistically the structure of the documents. It automatically learns the model, which guides the inference algorithm to encounter the best structural representation for a given input. Finally, we construct a knowledge-based model consisting of an ontological de nition of the domain and real data. This model permits to perform contextual reasoning and to detect semantic inconsistencies within the data. We evaluate the suitability of the proposed contributions in the framework of floor plan interpretation. Since there is no standard in the modeling of these documents there exists an enormous notation variability from plan to plan in terms of vocabulary and syntax. Therefore, floor plan interpretation is a relevant task in the graphical document understanding problem. It is also worth to mention that we make freely available all the resources used in this thesis {the data, the tool used to generate the data, and the evaluation scripts{ with the aim of fostering research in the graphical document understanding task.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Gemma Sanchez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-940902-8-8 Medium
Area Expedition Conference
Notes DAG; 600.077 Approved no
Call Number Admin @ si @ Her2014 Serial 2574
Permanent link to this record