|   | 
Details
   web
Records
Author Sergio Escalera; Jordi Gonzalez; Xavier Baro; Jamie Shotton
Title Guest Editor Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis Type Journal Article
Year 2016 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI
Volume 28 Issue Pages 1489 - 1491
Keywords
Abstract The sixteen papers in this special section focus on human pose recovery and behavior analysis (HuPBA). This is one of the most challenging topics in computer vision, pattern analysis, and machine learning. It is of critical importance for application areas that include gaming, computer interaction, human robot interaction, security, commerce, assistive technologies and rehabilitation, sports, sign language recognition, and driver assistance technology, to mention just a few. In essence, HuPBA requires dealing with the articulated nature of the human body, changes in appearance due to clothing, and the inherent problems of clutter scenes, such as background artifacts, occlusions, and illumination changes. These papers represent the most recent research in this field, including new methods considering still images, image sequences, depth data, stereo vision, 3D vision, audio, and IMUs, among others.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes HuPBA; ISE;MV; Approved no
Call Number Admin @ si @ Serial 2851
Permanent link to this record
 

 
Author Marc Oliu; Ciprian Corneanu; Kamal Nasrollahi; Olegs Nikisins; Sergio Escalera; Yunlian Sun; Haiqing Li; Zhenan Sun; Thomas B. Moeslund; Modris Greitans
Title Improved RGB-D-T based Face Recognition Type Journal Article
Year 2016 Publication IET Biometrics Abbreviated Journal BIO
Volume 5 Issue 4 Pages 297 - 303
Keywords
Abstract Reliable facial recognition systems are of crucial importance in various applications from entertainment to security. Thanks to the deep-learning concepts introduced in the field, a significant improvement in the performance of the unimodal facial recognition systems has been observed in the recent years. At the same time a multimodal facial recognition is a promising approach. This study combines the latest successes in both directions by applying deep learning convolutional neural networks (CNN) to the multimodal RGB, depth, and thermal (RGB-D-T) based facial recognition problem outperforming previously published results. Furthermore, a late fusion of the CNN-based recognition block with various hand-crafted features (local binary patterns, histograms of oriented gradients, Haar-like rectangular features, histograms of Gabor ordinal measures) is introduced, demonstrating even better recognition performance on a benchmark RGB-D-T database. The obtained results in this study show that the classical engineered features and CNN-based features can complement each other for recognition purposes.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes HuPBA;MILAB; Approved no
Call Number Admin @ si @ OCN2016 Serial 2854
Permanent link to this record
 

 
Author Arash Akbarinia; Karl R. Gegenfurtner
Title Metameric Mismatching in Natural and Artificial Reflectances Type Journal Article
Year 2017 Publication Journal of Vision Abbreviated Journal JV
Volume 17 Issue 10 Pages 390-390
Keywords Metamer; colour perception; spectral discrimination; photoreceptors
Abstract The human visual system and most digital cameras sample the continuous spectral power distribution through three classes of receptors. This implies that two distinct spectral reflectances can result in identical tristimulus values under one illuminant and differ under another – the problem of metamer mismatching. It is still debated how frequent this issue arises in the real world, using naturally occurring reflectance functions and common illuminants.

We gathered more than ten thousand spectral reflectance samples from various sources, covering a wide range of environments (e.g., flowers, plants, Munsell chips) and evaluated their responses under a number of natural and artificial source of lights. For each pair of reflectance functions, we estimated the perceived difference using the CIE-defined distance ΔE2000 metric in Lab color space.

The degree of metamer mismatching depended on the lower threshold value l when two samples would be considered to lead to equal sensor excitations (ΔE < l), and on the higher threshold value h when they would be considered different. For example, for l=h=1, we found that 43.129 comparisons out of a total of 6×107 pairs would be considered metameric (1 in 104). For l=1 and h=5, this number reduced to 705 metameric pairs (2 in 106). Extreme metamers, for instance l=1 and h=10, were rare (22 pairs or 6 in 108), as were instances where the two members of a metameric pair would be assigned to different color categories. Not unexpectedly, we observed variations among different reflectance databases and illuminant spectra with more frequency under artificial illuminants than natural ones.

Overall, our numbers are not very different from those obtained earlier (Foster et al, JOSA A, 2006). However, our results also show that the degree of metamerism is typically not very strong and that category switches hardly ever occur.
Address Florida, USA; May 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes NEUROBIT; no menciona Approved no
Call Number Admin @ si @ AkG2017 Serial 2899
Permanent link to this record
 

 
Author German Ros
Title Visual Scene Understanding for Autonomous Vehicles: Understanding Where and What Type Book Whole
Year 2016 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Making Ground Autonomous Vehicles (GAVs) a reality as a service for the society is one of the major scientific and technological challenges of this century. The potential benefits of autonomous vehicles include reducing accidents, improving traffic congestion and better usage of road infrastructures, among others. These vehicles must operate in our cities, towns and highways, dealing with many different types of situations while respecting traffic rules and protecting human lives. GAVs are expected to deal with all types of scenarios and situations, coping with an uncertain and chaotic world.
Therefore, in order to fulfill these demanding requirements GAVs need to be endowed with the capability of understanding their surrounding at many different levels, by means of affordable sensors and artificial intelligence. This capacity to understand the surroundings and the current situation that the vehicle is involved in is called scene understanding. In this work we investigate novel techniques to bring scene understanding to autonomous vehicles by combining the use of cameras as the main source of information—due to their versatility and affordability—and algorithms based on computer vision and machine learning. We investigate different degrees of understanding of the scene, starting from basic geometric knowledge about where is the vehicle within the scene. A robust and efficient estimation of the vehicle location and pose with respect to a map is one of the most fundamental steps towards autonomous driving. We study this problem from the point of view of robustness and computational efficiency, proposing key insights to improve current solutions. Then we advance to higher levels of abstraction to discover what is in the scene, by recognizing and parsing all the elements present on a driving scene, such as roads, sidewalks, pedestrians, etc. We investigate this problem known as semantic segmentation, proposing new approaches to improve recognition accuracy and computational efficiency. We cover these points by focusing on key aspects such as: (i) how to leverage computation moving semantics to an offline process, (ii) how to train compact architectures based on deconvolutional networks to achieve their maximum potential, (iii) how to use virtual worlds in combination with domain adaptation to produce accurate models in a cost-effective fashion, and (iv) how to use transfer learning techniques to prepare models to new situations. We finally extend the previous level of knowledge enabling systems to reasoning about what has change in a scene with respect to a previous visit, which in return allows for efficient and cost-effective map updating.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Angel Sappa;Julio Guerrero;Antonio Lopez
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-945373-1-8 Medium
Area Expedition Conference (down)
Notes ADAS Approved no
Call Number Admin @ si @ Ros2016 Serial 2860
Permanent link to this record
 

 
Author Francisco Cruz
Title Probabilistic Graphical Models for Document Analysis Type Book Whole
Year 2016 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Latest advances in digitization techniques have fostered the interest in creating digital copies of collections of documents. Digitized documents permit an easy maintenance, loss-less storage, and efficient ways for transmission and to perform information retrieval processes. This situation has opened a new market niche to develop systems able to automatically extract and analyze information contained in these collections, specially in the ambit of the business activity.

Due to the great variety of types of documents this is not a trivial task. For instance, the automatic extraction of numerical data from invoices differs substantially from a task of text recognition in historical documents. However, in order to extract the information of interest, is always necessary to identify the area of the document where it is located. In the area of Document Analysis we refer to this process as layout analysis, which aims at identifying and categorizing the different entities that compose the document, such as text regions, pictures, text lines, or tables, among others. To perform this task it is usually necessary to incorporate a prior knowledge about the task into the analysis process, which can be modeled by defining a set of contextual relations between the different entities of the document. The use of context has proven to be useful to reinforce the recognition process and improve the results on many computer vision tasks. It presents two fundamental questions: What kind of contextual information is appropriate for a given task, and how to incorporate this information into the models.

In this thesis we study several ways to incorporate contextual information to the task of document layout analysis, and to the particular case of handwritten text line segmentation. We focus on the study of Probabilistic Graphical Models and other mechanisms for this purpose, and propose several solutions to these problems. First, we present a method for layout analysis based on Conditional Random Fields. With this model we encode local contextual relations between variables, such as pair-wise constraints. Besides, we encode a set of structural relations between different classes of regions at feature level. Second, we present a method based on 2D-Probabilistic Context-free Grammars to encode structural and hierarchical relations. We perform a comparative study between Probabilistic Graphical Models and this syntactic approach. Third, we propose a method for structured documents based on Bayesian Networks to represent the document structure, and an algorithm based in the Expectation-Maximization to find the best configuration of the page. We perform a thorough evaluation of the proposed methods on two particular collections of documents: a historical collection composed of ancient structured documents, and a collection of contemporary documents. In addition, we present a general method for the task of handwritten text line segmentation. We define a probabilistic framework where we combine the EM algorithm with variational approaches for computing inference and parameter learning on a Markov Random Field. We evaluate our method on several collections of documents, including a general dataset of annotated administrative documents. Results demonstrate the applicability of our method to real problems, and the contribution of the use of contextual information to this kind of problems.
Address
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Oriol Ramos Terrades
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-945373-2-5 Medium
Area Expedition Conference (down)
Notes DAG Approved no
Call Number Admin @ si @ Cru2016 Serial 2861
Permanent link to this record
 

 
Author Lluis Gomez; Dimosthenis Karatzas
Title A fast hierarchical method for multi‐script and arbitrary oriented scene text extraction Type Journal Article
Year 2016 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR
Volume 19 Issue 4 Pages 335-349
Keywords scene text; segmentation; detection; hierarchical grouping; perceptual organisation
Abstract Typography and layout lead to the hierarchical organisation of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing text detection methods. This paper addresses the problem of text
segmentation in natural scenes from a hierarchical perspective.
Contrary to existing methods, we make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypotheses with
high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based in perceptual organization. Results obtained over four standard datasets, covering text in variable orientations and different languages, demonstrate that our algorithm, while being trained in a single mixed dataset, outperforms state of the art
methods in unconstrained scenarios.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes DAG; 600.056; 601.197 Approved no
Call Number Admin @ si @ GoK2016a Serial 2862
Permanent link to this record
 

 
Author Marta Diez-Ferrer; Debora Gil; Elena Carreño; Susana Padrones; Samantha Aso
Title Positive Airway Pressure-Enhanced CT to Improve Virtual Bronchoscopic Navigation Type Journal Article
Year 2017 Publication Journal of Thoracic Oncology Abbreviated Journal JTO
Volume 12 Issue 1S Pages S596-S597
Keywords Thorax CT; diagnosis; Peripheral Pulmonary Nodule
Abstract A main weakness of virtual bronchoscopic navigation (VBN) is unsuccessful segmentation of distal branches approaching peripheral pulmonary nodules (PPN). CT scan acquisition protocol is pivotal for segmentation covering the utmost periphery. We hypothesize that application of continuous positive airway pressure (CPAP) during CT acquisition could improve visualization and segmentation of peripheral bronchi. The purpose of the present pilot study is to compare quality of segmentations under 4 CT acquisition modes: inspiration (INSP), expiration (EXP) and both with CPAP (INSP-CPAP and EXP-CPAP).
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes IAM; 600.096; 600.075; 600.145 Approved no
Call Number Admin @ si @ DGC2017a Serial 2883
Permanent link to this record
 

 
Author Azadeh S. Mozafari; David Vazquez; Mansour Jamzad; Antonio Lopez
Title Node-Adapt, Path-Adapt and Tree-Adapt:Model-Transfer Domain Adaptation for Random Forest Type Miscellaneous
Year 2016 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords Domain Adaptation; Pedestrian detection; Random Forest
Abstract Random Forest (RF) is a successful paradigm for learning classifiers due to its ability to learn from large feature spaces and seamlessly integrate multi-class classification, as well as the achieved accuracy and processing efficiency. However, as many other classifiers, RF requires domain adaptation (DA) provided that there is a mismatch between the training (source) and testing (target) domains which provokes classification degradation. Consequently, different RF-DA methods have been proposed, which not only require target-domain samples but revisiting the source-domain ones, too. As novelty, we propose three inherently different methods (Node-Adapt, Path-Adapt and Tree-Adapt) that only require the learned source-domain RF and a relatively few target-domain samples for DA, i.e. source-domain samples do not need to be available. To assess the performance of our proposals we focus on image-based object detection, using the pedestrian detection problem as challenging proof-of-concept. Moreover, we use the RF with expert nodes because it is a competitive patch-based pedestrian model. We test our Node-, Path- and Tree-Adapt methods in standard benchmarks, showing that DA is largely achieved.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes ADAS Approved no
Call Number ADAS @ adas @ MVJ2016 Serial 2868
Permanent link to this record
 

 
Author Ariel Amato
Title Moving cast shadow detection Type Journal Article
Year 2014 Publication Electronic letters on computer vision and image analysis Abbreviated Journal ELCVIA
Volume 13 Issue 2 Pages 70-71
Keywords
Abstract Motion perception is an amazing innate ability of the creatures on the planet. This adroitness entails a functional advantage that enables species to compete better in the wild. The motion perception ability is usually employed at different levels, allowing from the simplest interaction with the ’physis’ up to the most transcendental survival tasks. Among the five classical perception system , vision is the most widely used in the motion perception field. Millions years of evolution have led to a highly specialized visual system in humans, which is characterized by a tremendous accuracy as well as an extraordinary robustness. Although humans and an immense diversity of species can distinguish moving object with a seeming simplicity, it has proven to be a difficult and non trivial problem from a computational perspective. In the field of Computer Vision, the detection of moving objects is a challenging and fundamental research area. This can be referred to as the ’origin’ of vast and numerous vision-based research sub-areas. Nevertheless, from the bottom to the top of this hierarchical analysis, the foundations still relies on when and where motion has occurred in an image. Pixels corresponding to moving objects in image sequences can be identified by measuring changes in their values. However, a pixel’s value (representing a combination of color and brightness) could also vary due to other factors such as: variation in scene illumination, camera noise and nonlinear sensor responses among others. The challenge lies in detecting if the changes in pixels’ value are caused by a genuine object movement or not. An additional challenging aspect in motion detection is represented by moving cast shadows. The paradox arises because a moving object and its cast shadow share similar motion patterns. However, a moving cast shadow is not a moving object. In fact, a shadow represents a photometric illumination effect caused by the relative position of the object with respect to the light sources. Shadow detection methods are mainly divided in two domains depending on the application field. One normally consists of static images where shadows are casted by static objects, whereas the second one is referred to image sequences where shadows are casted by moving objects. For the first case, shadows can provide additional geometric and semantic cues about shape and position of its casting object as well as the localization of the light source. Although the previous information can be extracted from static images as well as video sequences, the main focus in the second area is usually change detection, scene matching or surveillance. In this context, a shadow can severely affect with the analysis and interpretation of the scene. The work done in the thesis is focused on the second case, thus it addresses the problem of detection and removal of moving cast shadows in video sequences in order to enhance the detection of moving object.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes ISE Approved no
Call Number Admin @ si @ Ama2014 Serial 2870
Permanent link to this record
 

 
Author Antonio Lopez; Jiaolong Xu; Jose Luis Gomez; David Vazquez; German Ros
Title From Virtual to Real World Visual Perception using Domain Adaptation -- The DPM as Example Type Book Chapter
Year 2017 Publication Domain Adaptation in Computer Vision Applications Abbreviated Journal
Volume Issue 13 Pages 243-258
Keywords Domain Adaptation
Abstract Supervised learning tends to produce more accurate classifiers than unsupervised learning in general. This implies that training data is preferred with annotations. When addressing visual perception challenges, such as localizing certain object classes within an image, the learning of the involved classifiers turns out to be a practical bottleneck. The reason is that, at least, we have to frame object examples with bounding boxes in thousands of images. A priori, the more complex the model is regarding its number of parameters, the more annotated examples are required. This annotation task is performed by human oracles, which ends up in inaccuracies and errors in the annotations (aka ground truth) since the task is inherently very cumbersome and sometimes ambiguous. As an alternative we have pioneered the use of virtual worlds for collecting such annotations automatically and with high precision. However, since the models learned with virtual data must operate in the real world, we still need to perform domain adaptation (DA). In this chapter we revisit the DA of a deformable part-based model (DPM) as an exemplifying case of virtual- to-real-world DA. As a use case, we address the challenge of vehicle detection for driver assistance, using different publicly available virtual-world data. While doing so, we investigate questions such as: how does the domain gap behave due to virtual-vs-real data with respect to dominant object appearance per domain, as well as the role of photo-realism in the virtual world.
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor Gabriela Csurka
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes ADAS; 600.085; 601.223; 600.076; 600.118 Approved no
Call Number ADAS @ adas @ LXG2017 Serial 2872
Permanent link to this record
 

 
Author Pau Riba; Josep Llados; Alicia Fornes; Anjan Dutta
Title Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases Type Journal Article
Year 2017 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 87 Issue Pages 203-211
Keywords
Abstract Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations. However, retrieving a query graph from a large dataset of graphs implies a high computational complexity. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. With this aim, in this paper we propose a graph indexation formalism applied to visual retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Then, each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in different real scenarios such as handwritten word spotting in images of historical documents or symbol spotting in architectural floor plans.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes DAG; 600.097; 602.006; 603.053; 600.121 Approved no
Call Number RLF2017b Serial 2873
Permanent link to this record
 

 
Author David Geronimo; David Vazquez; Arturo de la Escalera
Title Vision-Based Advanced Driver Assistance Systems Type Book Chapter
Year 2017 Publication Computer Vision in Vehicle Technology: Land, Sea, and Air Abbreviated Journal
Volume Issue Pages
Keywords ADAS; Autonomous Driving
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes ADAS; 600.118 Approved no
Call Number ADAS @ adas @ GVE2017 Serial 2881
Permanent link to this record
 

 
Author German Ros; Laura Sellart; Gabriel Villalonga; Elias Maidanik; Francisco Molero; Marc Garcia; Adriana Cedeño; Francisco Perez; Didier Ramirez; Eduardo Escobar; Jose Luis Gomez; David Vazquez; Antonio Lopez
Title Semantic Segmentation of Urban Scenes via Domain Adaptation of SYNTHIA Type Book Chapter
Year 2017 Publication Domain Adaptation in Computer Vision Applications Abbreviated Journal
Volume 12 Issue Pages 227-241
Keywords SYNTHIA; Virtual worlds; Autonomous Driving
Abstract Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. Recent revolutionary results of deep convolutional neural networks (DCNNs) foreshadow the advent of reliable classifiers to perform such visual tasks. However, DCNNs require learning of many parameters from raw images; thus, having a sufficient amount of diverse images with class annotations is needed. These annotations are obtained via cumbersome, human labour which is particularly challenging for semantic segmentation since pixel-level annotations are required. In this chapter, we propose to use a combination of a virtual world to automatically generate realistic synthetic images with pixel-level annotations, and domain adaptation to transfer the models learnt to correctly operate in real scenarios. We address the question of how useful synthetic data can be for semantic segmentation – in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations and object identifiers. We use SYNTHIA in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments with DCNNs that show that combining SYNTHIA with simple domain adaptation techniques in the training stage significantly improves performance on semantic segmentation.
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor Gabriela Csurka
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes ADAS; 600.085; 600.082; 600.076; 600.118 Approved no
Call Number ADAS @ adas @ RSV2017 Serial 2882
Permanent link to this record
 

 
Author H. Martin Kjer; Jens Fagertuna; Sergio Vera; Debora Gil; Miguel Angel Gonzalez Ballester; Rasmus R. Paulsena
Title Free-form image registration of human cochlear uCT data using skeleton similarity as anatomical prior Type Journal Article
Year 2016 Publication Patter Recognition Letters Abbreviated Journal PRL
Volume 76 Issue 1 Pages 76-82
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes IAM; 600.060 Approved no
Call Number Admin @ si @ MFV2017b Serial 2941
Permanent link to this record
 

 
Author Lluis Gomez; Dimosthenis Karatzas
Title TextProposals: a Text‐specific Selective Search Algorithm for Word Spotting in the Wild Type Journal Article
Year 2017 Publication Pattern Recognition Abbreviated Journal PR
Volume 70 Issue Pages 60-74
Keywords
Abstract Motivated by the success of powerful while expensive techniques to recognize words in a holistic way (Goel et al., 2013; Almazán et al., 2014; Jaderberg et al., 2016) object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way.

Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers (Almazán et al., 2014; Jaderberg et al., 2016) shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10% F-score the best-performing method in the last ICDAR Robust Reading Competition (Karatzas, 2015). Source code of the complete end-to-end system is available at https://github.com/lluisgomez/TextProposals.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference (down)
Notes DAG; 600.084; 601.197; 600.121; 600.129 Approved no
Call Number Admin @ si @ GoK2017 Serial 2886
Permanent link to this record