toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Victor Ponce edit  url
openurl 
  Title Evolutionary Bags of Space-Time Features for Human Analysis Type Book Whole
  Year 2016 Publication PhD Thesis Universitat de Barcelona, UOC and CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords Computer algorithms; Digital image processing; Digital video; Analysis of variance; Dynamic programming; Evolutionary computation; Gesture  
  Abstract The representation (or feature) learning has been an emerging concept in the last years, since it collects a set of techniques that are present in any theoretical or practical methodology referring to artificial intelligence. In computer vision, a very common representation has adopted the form of the well-known Bag of Visual Words. This representation appears implicitly in most approaches where images are described, and is also present in a huge number of areas and domains: image content retrieval, pedestrian detection, human-computer interaction, surveillance, e-health, and social computing, amongst others. The early stages of this dissertation provide an approach for learning visual representations inside evolutionary algorithms, which consists of evolving weighting schemes to improve the BoVW representations for the task of recognizing categories of videos and images. Thus, we demonstrate the applicability of the most common weighting schemes, which are often used in text mining but are less frequently found in computer vision tasks. Beyond learning these visual representations, we provide an approach based on fusion strategies for learning spatiotemporal representations, from multimodal data obtained by depth sensors. Besides, we specially aim at the evolutionary and dynamic modelling, where the temporal factor is present in the nature of the data, such as video sequences of gestures and actions. Indeed, we explore the effects of probabilistic modelling for those approaches based on dynamic programming, so as to handle the temporal deformation and variance amongst video sequences of different categories. Finally, we integrate dynamic programming and generative models into an evolutionary computation framework, with the aim of learning Bags of SubGestures (BoSG) representations and hence to improve the generalization capability of standard gesture recognition approaches. The results obtained in the experimentation demonstrate, first, that evolutionary algorithms are useful for improving the representation of BoVW approaches in several datasets for recognizing categories in still images and video sequences. On the other hand, our experimentation reveals that both, the use of dynamic programming and generative models to align video sequences, and the representations obtained from applying fusion strategies in multimodal data, entail an enhancement on the performance when recognizing some gesture categories. Furthermore, the combination of evolutionary algorithms with models based on dynamic programming and generative approaches results, when aiming at the classification of video categories on large video datasets, in a considerable improvement over standard gesture and action recognition approaches. Finally, we demonstrate the applications of these representations in several domains for human analysis: classification of images where humans may be present, action and gesture recognition for general applications, and in particular for conversational settings within the field of restorative justice  
  Address June 2016  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Sergio Escalera;Xavier Baro;Hugo Jair Escalante  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes HuPBA Approved no  
  Call Number Pon2016 Serial 2814  
Permanent link to this record
 

 
Author Angel Sappa; Cristhian A. Aguilera-Carrasco; Juan A. Carvajal Ayala; Miguel Oliveira; Dennis Romero; Boris X. Vintimilla; Ricardo Toledo edit   pdf
doi  openurl
  Title Monocular visual odometry: A cross-spectral image fusion based approach Type Journal Article
  Year 2016 Publication Robotics and Autonomous Systems Abbreviated Journal RAS  
  Volume 85 Issue Pages 26-36  
  Keywords Monocular visual odometry; LWIR-RGB cross-spectral imaging; Image fusion  
  Abstract This manuscript evaluates the usage of fused cross-spectral images in a monocular visual odometry approach. Fused images are obtained through a Discrete Wavelet Transform (DWT) scheme, where the best setup is empirically obtained by means of a mutual information based evaluation metric. The objective is to have a flexible scheme where fusion parameters are adapted according to the characteristics of the given images. Visual odometry is computed from the fused monocular images using an off the shelf approach. Experimental results using data sets obtained with two different platforms are presented. Additionally, comparison with a previous approach as well as with monocular-visible/infrared spectra are also provided showing the advantages of the proposed scheme.  
  Address  
  Corporate Author Thesis  
  Publisher Elsevier B.V. Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes ADAS;600.086; 600.076 Approved no  
  Call Number Admin @ si @SAC2016 Serial 2811  
Permanent link to this record
 

 
Author Maria Salamo; Inmaculada Rodriguez; Maite Lopez; Anna Puig; Simone Balocco; Mariona Taule edit  openurl
  Title Recurso docente para la atención de la diversidad en el aula mediante la predicción de notas Type Journal
  Year 2016 Publication ReVision Abbreviated Journal  
  Volume 9 Issue 1 Pages  
  Keywords Aprendizaje automatico; Sistema de prediccion de notas; Herramienta docente  
  Abstract Desde la implantación del Espacio Europeo de Educación Superior (EEES) en los diferentes grados, se ha puesto de manifiesto la necesidad de utilizar diversos mecanismos que permitan tratar la diversidad en el aula, evaluando automáticamente y proporcionando una retroalimentación rápida tanto al alumnado como al profesorado sobre la evolución de los alumnos en una asignatura. En este artículo se presenta la evaluación de la exactitud en las predicciones de GRADEFORESEER, un recurso docente para la predicción de notas basado en técnicas de aprendizaje automático que permite evaluar la evolución del alumnado y estimar su nota final al terminar el curso. Este recurso se ha complementado con una interfaz de usuario para el profesorado que puede ser usada en diferentes plataformas software (sistemas operativos) y en cualquier asignatura de un grado en la que se utilice evaluación continuada. Además de la descripción del recurso, este artículo presenta los resultados obtenidos al aplicar el sistema de predicción en cuatro asignaturas de disciplinas distintas: Programación I (PI), Diseño de Software (DSW) del grado de Ingeniería Informática, Tecnologías de la Información y la Comunicación (TIC) del grado de Lingüística y la asignatura Fundamentos de Tecnología (FDT) del grado de Información y Documentación, todas ellas impartidas en la Universidad de Barcelona.

La capacidad predictiva se ha evaluado de forma binaria (aprueba o no) y según un criterio de rango (suspenso, aprobado, notable o sobresaliente), obteniendo mejores predicciones en los resultados evaluados de forma binaria.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes MILAB; Approved no  
  Call Number Admin @ si @ SRL2016 Serial 2820  
Permanent link to this record
 

 
Author Simone Balocco; Maria Zuluaga; Guillaume Zahnd; Su-Lin Lee; Stefanie Demirci edit  isbn
openurl 
  Title Computing and Visualization for Intravascular Imaging and Computer Assisted Stenting Type Book Whole
  Year 2016 Publication Computing and Visualization for Intravascular Imaging and Computer-Assisted Stenting Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Elsevier Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 9780128110188 Medium  
  Area Expedition Conference (up)  
  Notes MILAB Approved no  
  Call Number Admin @ si @ BZZ2016 Serial 2821  
Permanent link to this record
 

 
Author Maria Elena Meza-de-Luna; Juan Ramon Terven Salinas; Bogdan Raducanu; Joaquin Salas edit   pdf
doi  openurl
  Title Assessing the Influence of Mirroring on the Perception of Professional Competence using Wearable Technology Type Journal Article
  Year 2016 Publication IEEE Transactions on Affective Computing Abbreviated Journal TAC  
  Volume 9 Issue 2 Pages 161-175  
  Keywords Mirroring; Nodding; Competence; Perception; Wearable Technology  
  Abstract Nonverbal communication is an intrinsic part in daily face-to-face meetings. A frequently observed behavior during social interactions is mirroring, in which one person tends to mimic the attitude of the counterpart. This paper shows that a computer vision system could be used to predict the perception of competence in dyadic interactions through the automatic detection of mirroring
events. To prove our hypothesis, we developed: (1) A social assistant for mirroring detection, using a wearable device which includes a video camera and (2) an automatic classifier for the perception of competence, using the number of nodding gestures and mirroring events as predictors. For our study, we used a mixed-method approach in an experimental design where 48 participants acting as customers interacted with a confederated psychologist. We found that the number of nods or mirroring events has a significant influence on the perception of competence. Our results suggest that: (1) Customer mirroring is a better predictor than psychologist mirroring; (2) the number of psychologist’s nods is a better predictor than the number of customer’s nods; (3) except for the psychologist mirroring, the computer vision algorithm we used worked about equally well whether it was acquiring images from wearable smartglasses or fixed cameras.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes LAMP; 600.072; Approved no  
  Call Number Admin @ si @ MTR2016 Serial 2826  
Permanent link to this record
 

 
Author Sumit K. Banchhor; Tadashi Araki; Narendra D. Londhe; Nobutaka Ikeda; Petia Radeva; Ayman El-Baz; Luca Saba; Andrew Nicolaides; Shoaib Shafique; John R. Laird; Jasjit S. Suri edit  doi
openurl 
  Title Five multiresolution-based calcium volume measurement techniques from coronary IVUS videos: A comparative approach Type Journal Article
  Year 2016 Publication Computer Methods and Programs in Biomedicine Abbreviated Journal CMPB  
  Volume 134 Issue Pages 237-258  
  Keywords  
  Abstract BACKGROUND AND OBJECTIVE:
Fast intravascular ultrasound (IVUS) video processing is required for calcium volume computation during the planning phase of percutaneous coronary interventional (PCI) procedures. Nonlinear multiresolution techniques are generally applied to improve the processing time by down-sampling the video frames.
METHODS:
This paper presents four different segmentation methods for calcium volume measurement, namely Threshold-based, Fuzzy c-Means (FCM), K-means, and Hidden Markov Random Field (HMRF) embedded with five different kinds of multiresolution techniques (bilinear, bicubic, wavelet, Lanczos, and Gaussian pyramid). This leads to 20 different kinds of combinations. IVUS image data sets consisting of 38,760 IVUS frames taken from 19 patients were collected using 40 MHz IVUS catheter (Atlantis® SR Pro, Boston Scientific®, pullback speed of 0.5 mm/sec.). The performance of these 20 systems is compared with and without multiresolution using the following metrics: (a) computational time; (b) calcium volume; (c) image quality degradation ratio; and (d) quality assessment ratio.
RESULTS:
Among the four segmentation methods embedded with five kinds of multiresolution techniques, FCM segmentation combined with wavelet-based multiresolution gave the best performance. FCM and wavelet experienced the highest percentage mean improvement in computational time of 77.15% and 74.07%, respectively. Wavelet interpolation experiences the highest mean precision-of-merit (PoM) of 94.06 ± 3.64% and 81.34 ± 16.29% as compared to other multiresolution techniques for volume level and frame level respectively. Wavelet multiresolution technique also experiences the highest Jaccard Index and Dice Similarity of 0.7 and 0.8, respectively. Multiresolution is a nonlinear operation which introduces bias and thus degrades the image. The proposed system also provides a bias correction approach to enrich the system, giving a better mean calcium volume similarity for all the multiresolution-based segmentation methods. After including the bias correction, bicubic interpolation gives the largest increase in mean calcium volume similarity of 4.13% compared to the rest of the multiresolution techniques. The system is automated and can be adapted in clinical settings.
CONCLUSIONS:
We demonstrated the time improvement in calcium volume computation without compromising the quality of IVUS image. Among the 20 different combinations of multiresolution with calcium volume segmentation methods, the FCM embedded with wavelet-based multiresolution gave the best performance.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes MILAB; Approved no  
  Call Number Admin @ si @ BAL2016 Serial 2830  
Permanent link to this record
 

 
Author Santiago Segui; Michal Drozdzal; Guillem Pascual; Petia Radeva; Carolina Malagelada; Fernando Azpiroz; Jordi Vitria edit   pdf
url  openurl
  Title Generic Feature Learning for Wireless Capsule Endoscopy Analysis Type Journal Article
  Year 2016 Publication Computers in Biology and Medicine Abbreviated Journal CBM  
  Volume 79 Issue Pages 163-172  
  Keywords Wireless capsule endoscopy; Deep learning; Feature learning; Motility analysis  
  Abstract The interpretation and analysis of wireless capsule endoscopy (WCE) recordings is a complex task which requires sophisticated computer aided decision (CAD) systems to help physicians with video screening and, finally, with the diagnosis. Most CAD systems used in capsule endoscopy share a common system design, but use very different image and video representations. As a result, each time a new clinical application of WCE appears, a new CAD system has to be designed from the scratch. This makes the design of new CAD systems very time consuming. Therefore, in this paper we introduce a system for small intestine motility characterization, based on Deep Convolutional Neural Networks, which circumvents the laborious step of designing specific features for individual motility events. Experimental results show the superiority of the learned features over alternative classifiers constructed using state-of-the-art handcrafted features. In particular, it reaches a mean classification accuracy of 96% for six intestinal motility events, outperforming the other classifiers by a large margin (a 14% relative performance increase).  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes OR; MILAB;MV; Approved no  
  Call Number Admin @ si @ SDP2016 Serial 2836  
Permanent link to this record
 

 
Author Pedro Herruzo; Marc Bolaños; Petia Radeva edit   pdf
url  doi
openurl 
  Title Can a CNN Recognize Catalan Diet? Type Book Chapter
  Year 2016 Publication AIP Conference Proceedings Abbreviated Journal  
  Volume 1773 Issue Pages  
  Keywords  
  Abstract CoRR abs/1607.08811
Nowadays, we can find several diseases related to the unhealthy diet habits of the population, such as diabetes, obesity, anemia, bulimia and anorexia. In many cases, these diseases are related to the food consumption of people. Mediterranean diet is scientifically known as a healthy diet that helps to prevent many metabolic diseases. In particular, our work focuses on the recognition of Mediterranean food and dishes. The development of this methodology would allow to analise the daily habits of users with wearable cameras, within the topic of lifelogging. By using automatic mechanisms we could build an objective tool for the analysis of the patient’s behavior, allowing specialists to discover unhealthy food patterns and understand the user’s lifestyle.
With the aim to automatically recognize a complete diet, we introduce a challenging multi-labeled dataset related to Mediter-ranean diet called FoodCAT. The first type of label provided consists of 115 food classes with an average of 400 images per dish, and the second one consists of 12 food categories with an average of 3800 pictures per class. This dataset will serve as a basis for the development of automatic diet recognition. In this context, deep learning and more specifically, Convolutional Neural Networks (CNNs), currently are state-of-the-art methods for automatic food recognition. In our work, we compare several architectures for image classification, with the purpose of diet recognition. Applying the best model for recognising food categories, we achieve a top-1 accuracy of 72.29%, and top-5 of 97.07%. In a complete diet recognition of dishes from Mediterranean diet, enlarged with the Food-101 dataset for international dishes recognition, we achieve a top-1 accuracy of 68.07%, and top-5 of 89.53%, for a total of 115+101 food classes.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes MILAB Approved no  
  Call Number Admin @ si @ HBR2016 Serial 2837  
Permanent link to this record
 

 
Author Mikkel Thogersen; Sergio Escalera; Jordi Gonzalez; Thomas B. Moeslund edit  url
openurl 
  Title Segmentation of RGB-D Indoor scenes by Stacking Random Forests and Conditional Random Fields Type Journal Article
  Year 2016 Publication Pattern Recognition Letters Abbreviated Journal PRL  
  Volume 80 Issue Pages 208–215  
  Keywords  
  Abstract This paper proposes a technique for RGB-D scene segmentation using Multi-class
Multi-scale Stacked Sequential Learning (MMSSL) paradigm. Following recent trends in state-of-the-art, a base classifier uses an initial SLIC segmentation to obtain superpixels which provide a diminution of data while retaining object boundaries. A series of color and depth features are extracted from the superpixels, and are used in a Conditional Random Field (CRF) to predict superpixel labels. Furthermore, a Random Forest (RF) classifier using random offset features is also used as an input to the CRF, acting as an initial prediction. As a stacked classifier, another Random Forest is used acting on a spatial multi-scale decomposition of the CRF confidence map to correct the erroneous labels assigned by the previous classifier. The model is tested on the popular NYU-v2 dataset.
The approach shows that simple multi-modal features with the power of the MMSSL
paradigm can achieve better performance than state of the art results on the same dataset.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes HuPBA; ISE;MILAB; 600.098; 600.119 Approved no  
  Call Number Admin @ si @ TEG2016 Serial 2843  
Permanent link to this record
 

 
Author Sergio Escalera; Jordi Gonzalez; Xavier Baro; Jamie Shotton edit  doi
openurl 
  Title Guest Editor Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis Type Journal Article
  Year 2016 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 28 Issue Pages 1489 - 1491  
  Keywords  
  Abstract The sixteen papers in this special section focus on human pose recovery and behavior analysis (HuPBA). This is one of the most challenging topics in computer vision, pattern analysis, and machine learning. It is of critical importance for application areas that include gaming, computer interaction, human robot interaction, security, commerce, assistive technologies and rehabilitation, sports, sign language recognition, and driver assistance technology, to mention just a few. In essence, HuPBA requires dealing with the articulated nature of the human body, changes in appearance due to clothing, and the inherent problems of clutter scenes, such as background artifacts, occlusions, and illumination changes. These papers represent the most recent research in this field, including new methods considering still images, image sequences, depth data, stereo vision, 3D vision, audio, and IMUs, among others.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes HuPBA; ISE;MV; Approved no  
  Call Number Admin @ si @ Serial 2851  
Permanent link to this record
 

 
Author Marc Oliu; Ciprian Corneanu; Kamal Nasrollahi; Olegs Nikisins; Sergio Escalera; Yunlian Sun; Haiqing Li; Zhenan Sun; Thomas B. Moeslund; Modris Greitans edit  url
openurl 
  Title Improved RGB-D-T based Face Recognition Type Journal Article
  Year 2016 Publication IET Biometrics Abbreviated Journal BIO  
  Volume 5 Issue 4 Pages 297 - 303  
  Keywords  
  Abstract Reliable facial recognition systems are of crucial importance in various applications from entertainment to security. Thanks to the deep-learning concepts introduced in the field, a significant improvement in the performance of the unimodal facial recognition systems has been observed in the recent years. At the same time a multimodal facial recognition is a promising approach. This study combines the latest successes in both directions by applying deep learning convolutional neural networks (CNN) to the multimodal RGB, depth, and thermal (RGB-D-T) based facial recognition problem outperforming previously published results. Furthermore, a late fusion of the CNN-based recognition block with various hand-crafted features (local binary patterns, histograms of oriented gradients, Haar-like rectangular features, histograms of Gabor ordinal measures) is introduced, demonstrating even better recognition performance on a benchmark RGB-D-T database. The obtained results in this study show that the classical engineered features and CNN-based features can complement each other for recognition purposes.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes HuPBA;MILAB; Approved no  
  Call Number Admin @ si @ OCN2016 Serial 2854  
Permanent link to this record
 

 
Author German Ros edit  isbn
openurl 
  Title Visual Scene Understanding for Autonomous Vehicles: Understanding Where and What Type Book Whole
  Year 2016 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Making Ground Autonomous Vehicles (GAVs) a reality as a service for the society is one of the major scientific and technological challenges of this century. The potential benefits of autonomous vehicles include reducing accidents, improving traffic congestion and better usage of road infrastructures, among others. These vehicles must operate in our cities, towns and highways, dealing with many different types of situations while respecting traffic rules and protecting human lives. GAVs are expected to deal with all types of scenarios and situations, coping with an uncertain and chaotic world.
Therefore, in order to fulfill these demanding requirements GAVs need to be endowed with the capability of understanding their surrounding at many different levels, by means of affordable sensors and artificial intelligence. This capacity to understand the surroundings and the current situation that the vehicle is involved in is called scene understanding. In this work we investigate novel techniques to bring scene understanding to autonomous vehicles by combining the use of cameras as the main source of information—due to their versatility and affordability—and algorithms based on computer vision and machine learning. We investigate different degrees of understanding of the scene, starting from basic geometric knowledge about where is the vehicle within the scene. A robust and efficient estimation of the vehicle location and pose with respect to a map is one of the most fundamental steps towards autonomous driving. We study this problem from the point of view of robustness and computational efficiency, proposing key insights to improve current solutions. Then we advance to higher levels of abstraction to discover what is in the scene, by recognizing and parsing all the elements present on a driving scene, such as roads, sidewalks, pedestrians, etc. We investigate this problem known as semantic segmentation, proposing new approaches to improve recognition accuracy and computational efficiency. We cover these points by focusing on key aspects such as: (i) how to leverage computation moving semantics to an offline process, (ii) how to train compact architectures based on deconvolutional networks to achieve their maximum potential, (iii) how to use virtual worlds in combination with domain adaptation to produce accurate models in a cost-effective fashion, and (iv) how to use transfer learning techniques to prepare models to new situations. We finally extend the previous level of knowledge enabling systems to reasoning about what has change in a scene with respect to a previous visit, which in return allows for efficient and cost-effective map updating.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Angel Sappa;Julio Guerrero;Antonio Lopez  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-945373-1-8 Medium  
  Area Expedition Conference (up)  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Ros2016 Serial 2860  
Permanent link to this record
 

 
Author Francisco Cruz edit  isbn
openurl 
  Title Probabilistic Graphical Models for Document Analysis Type Book Whole
  Year 2016 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Latest advances in digitization techniques have fostered the interest in creating digital copies of collections of documents. Digitized documents permit an easy maintenance, loss-less storage, and efficient ways for transmission and to perform information retrieval processes. This situation has opened a new market niche to develop systems able to automatically extract and analyze information contained in these collections, specially in the ambit of the business activity.

Due to the great variety of types of documents this is not a trivial task. For instance, the automatic extraction of numerical data from invoices differs substantially from a task of text recognition in historical documents. However, in order to extract the information of interest, is always necessary to identify the area of the document where it is located. In the area of Document Analysis we refer to this process as layout analysis, which aims at identifying and categorizing the different entities that compose the document, such as text regions, pictures, text lines, or tables, among others. To perform this task it is usually necessary to incorporate a prior knowledge about the task into the analysis process, which can be modeled by defining a set of contextual relations between the different entities of the document. The use of context has proven to be useful to reinforce the recognition process and improve the results on many computer vision tasks. It presents two fundamental questions: What kind of contextual information is appropriate for a given task, and how to incorporate this information into the models.

In this thesis we study several ways to incorporate contextual information to the task of document layout analysis, and to the particular case of handwritten text line segmentation. We focus on the study of Probabilistic Graphical Models and other mechanisms for this purpose, and propose several solutions to these problems. First, we present a method for layout analysis based on Conditional Random Fields. With this model we encode local contextual relations between variables, such as pair-wise constraints. Besides, we encode a set of structural relations between different classes of regions at feature level. Second, we present a method based on 2D-Probabilistic Context-free Grammars to encode structural and hierarchical relations. We perform a comparative study between Probabilistic Graphical Models and this syntactic approach. Third, we propose a method for structured documents based on Bayesian Networks to represent the document structure, and an algorithm based in the Expectation-Maximization to find the best configuration of the page. We perform a thorough evaluation of the proposed methods on two particular collections of documents: a historical collection composed of ancient structured documents, and a collection of contemporary documents. In addition, we present a general method for the task of handwritten text line segmentation. We define a probabilistic framework where we combine the EM algorithm with variational approaches for computing inference and parameter learning on a Markov Random Field. We evaluate our method on several collections of documents, including a general dataset of annotated administrative documents. Results demonstrate the applicability of our method to real problems, and the contribution of the use of contextual information to this kind of problems.
 
  Address  
  Corporate Author Thesis Ph.D. thesis  
  Publisher Ediciones Graficas Rey Place of Publication Editor Oriol Ramos Terrades  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-945373-2-5 Medium  
  Area Expedition Conference (up)  
  Notes DAG Approved no  
  Call Number Admin @ si @ Cru2016 Serial 2861  
Permanent link to this record
 

 
Author Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  openurl
  Title A fast hierarchical method for multi‐script and arbitrary oriented scene text extraction Type Journal Article
  Year 2016 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR  
  Volume 19 Issue 4 Pages 335-349  
  Keywords scene text; segmentation; detection; hierarchical grouping; perceptual organisation  
  Abstract Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing text detection methods. This paper addresses the problem of text
segmentation in natural scenes from a hierarchical perspective.
Contrary to existing methods, we make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypotheses with
high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Results obtained over four standard datasets, covering text in variable orientations and different languages, demonstrate that our algorithm, while being trained in a single mixed dataset, outperforms state of the art
methods in unconstrained scenarios.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes DAG; 600.056; 601.197 Approved no  
  Call Number Admin @ si @ GoK2016a Serial 2862  
Permanent link to this record
 

 
Author Azadeh S. Mozafari; David Vazquez; Mansour Jamzad; Antonio Lopez edit   pdf
openurl 
  Title Node-Adapt, Path-Adapt and Tree-Adapt:Model-Transfer Domain Adaptation for Random Forest Type Miscellaneous
  Year 2016 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords Domain Adaptation; Pedestrian detection; Random Forest  
  Abstract Random Forest (RF) is a successful paradigm for learning classifiers due to its ability to learn from large feature spaces and seamlessly integrate multi-class classification, as well as the achieved accuracy and processing efficiency. However, as many other classifiers, RF requires domain adaptation (DA) provided that there is a mismatch between the training (source) and testing (target) domains which provokes classification degradation. Consequently, different RF-DA methods have been proposed, which not only require target-domain samples but revisiting the source-domain ones, too. As novelty, we propose three inherently different methods (Node-Adapt, Path-Adapt and Tree-Adapt) that only require the learned source-domain RF and a relatively few target-domain samples for DA, i.e. source-domain samples do not need to be available. To assess the performance of our proposals we focus on image-based object detection, using the pedestrian detection problem as challenging proof-of-concept. Moreover, we use the RF with expert nodes because it is a competitive patch-based pedestrian model. We test our Node-, Path- and Tree-Adapt methods in standard benchmarks, showing that DA is largely achieved.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (up)  
  Notes ADAS Approved no  
  Call Number ADAS @ adas @ MVJ2016 Serial 2868  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: