|   | 
Details
   web
Records
Author Thanh Ha Do; Salvatore Tabbone; Oriol Ramos Terrades
Title Sparse representation over learned dictionary for symbol recognition Type Journal Article
Year 2016 Publication Signal Processing Abbreviated Journal SP
Volume 125 Issue Pages 36-47
Keywords (up) Symbol Recognition; Sparse Representation; Learned Dictionary; Shape Context; Interest Points
Abstract In this paper we propose an original sparse vector model for symbol retrieval task. More speci cally, we apply the K-SVD algorithm for learning a visual dictionary based on symbol descriptors locally computed around interest points. Results on benchmark datasets show that the obtained sparse representation is competitive related to state-of-the-art methods. Moreover, our sparse representation is invariant to rotation and scale transforms and also robust to degraded images and distorted symbols. Thereby, the learned visual dictionary is able to represent instances of unseen classes of symbols.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG; 600.061; 600.077 Approved no
Call Number Admin @ si @ DTR2016 Serial 2946
Permanent link to this record
 

 
Author Anjan Dutta; Josep Llados; Umapada Pal
Title A symbol spotting approach in graphical documents by hashing serialized graphs Type Journal Article
Year 2013 Publication Pattern Recognition Abbreviated Journal PR
Volume 46 Issue 3 Pages 752-768
Keywords (up) Symbol spotting; Graphics recognition; Graph matching; Graph serialization; Graph factorization; Graph paths; Hashing
Abstract In this paper we propose a symbol spotting technique in graphical documents. Graphs are used to represent the documents and a (sub)graph matching technique is used to detect the symbols in them. We propose a graph serialization to reduce the usual computational complexity of graph matching. Serialization of graphs is performed by computing acyclic graph paths between each pair of connected nodes. Graph paths are one-dimensional structures of graphs which are less expensive in terms of computation. At the same time they enable robust localization even in the presence of noise and distortion. Indexing in large graph databases involves a computational burden as well. We propose a graph factorization approach to tackle this problem. Factorization is intended to create a unified indexed structure over the database of graphical documents. Once graph paths are extracted, the entire database of graphical documents is indexed in hash tables by locality sensitive hashing (LSH) of shape descriptors of the paths. The hashing data structure aims to execute an approximate k-NN search in a sub-linear time. We have performed detailed experiments with various datasets of line drawings and compared our method with the state-of-the-art works. The results demonstrate the effectiveness and efficiency of our technique.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.042; 600.045; 605.203; 601.152 Approved no
Call Number Admin @ si @ DLP2012 Serial 2127
Permanent link to this record
 

 
Author Joan Mas; Josep Llados; Gemma Sanchez; J.A. Jorge
Title A syntactic approach based on distortion-tolerant Adjacency Grammars and a spatial-directed parser to interpret sketched diagrams Type Journal Article
Year 2010 Publication Pattern Recognition Abbreviated Journal PR
Volume 43 Issue 12 Pages 4148–4164
Keywords (up) Syntactic Pattern Recognition; Symbol recognition; Diagram understanding; Sketched diagrams; Adjacency Grammars; Incremental parsing; Spatial directed parsing
Abstract This paper presents a syntactic approach based on Adjacency Grammars (AG) for sketch diagram modeling and understanding. Diagrams are a combination of graphical symbols arranged according to a set of spatial rules defined by a visual language. AG describe visual shapes by productions defined in terms of terminal and non-terminal symbols (graphical primitives and subshapes), and a set functions describing the spatial arrangements between symbols. Our approach to sketch diagram understanding provides three main contributions. First, since AG are linear grammars, there is a need to define shapes and relations inherently bidimensional using a sequential formalism. Second, our parsing approach uses an indexing structure based on a spatial tessellation. This serves to reduce the search space when finding candidates to produce a valid reduction. This allows order-free parsing of 2D visual sentences while keeping combinatorial explosion in check. Third, working with sketches requires a distortion model to cope with the natural variations of hand drawn strokes. To this end we extended the basic grammar with a distortion measure modeled on the allowable variation on spatial constraints associated with grammar productions. Finally, the paper reports on an experimental framework an interactive system for sketch analysis. User tests performed on two real scenarios show that our approach is usable in interactive settings.
Address
Corporate Author Thesis
Publisher Elsevier Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number DAG @ dag @ MLS2010 Serial 1336
Permanent link to this record
 

 
Author Cesar Isaza; Joaquin Salas; Bogdan Raducanu
Title Rendering ground truth data sets to detect shadows cast by static objects in outdoors Type Journal Article
Year 2014 Publication Multimedia Tools and Applications Abbreviated Journal MTAP
Volume 70 Issue 1 Pages 557-571
Keywords (up) Synthetic ground truth data set; Sun position; Shadow detection; Static objects shadow detection
Abstract In our work, we are particularly interested in studying the shadows cast by static objects in outdoor environments, during daytime. To assess the accuracy of a shadow detection algorithm, we need ground truth information. The collection of such information is a very tedious task because it is a process that requires manual annotation. To overcome this severe limitation, we propose in this paper a methodology to automatically render ground truth using a virtual environment. To increase the degree of realism and usefulness of the simulated environment, we incorporate in the scenario the precise longitude, latitude and elevation of the actual location of the object, as well as the sun’s position for a given time and day. To evaluate our method, we consider a qualitative and a quantitative comparison. In the quantitative one, we analyze the shadow cast by a real object in a particular geographical location and its corresponding rendered model. To evaluate qualitatively the methodology, we use some ground truth images obtained both manually and automatically.
Address
Corporate Author Thesis
Publisher Springer US Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1380-7501 ISBN Medium
Area Expedition Conference
Notes LAMP; Approved no
Call Number Admin @ si @ ISR2014 Serial 2229
Permanent link to this record
 

 
Author German Ros; Laura Sellart; Gabriel Villalonga; Elias Maidanik; Francisco Molero; Marc Garcia; Adriana Cedeño; Francisco Perez; Didier Ramirez; Eduardo Escobar; Jose Luis Gomez; David Vazquez; Antonio Lopez
Title Semantic Segmentation of Urban Scenes via Domain Adaptation of SYNTHIA Type Book Chapter
Year 2017 Publication Domain Adaptation in Computer Vision Applications Abbreviated Journal
Volume 12 Issue Pages 227-241
Keywords (up) SYNTHIA; Virtual worlds; Autonomous Driving
Abstract Vision-based semantic segmentation in urban scenarios is a key functionality for autonomous driving. Recent revolutionary results of deep convolutional neural networks (DCNNs) foreshadow the advent of reliable classifiers to perform such visual tasks. However, DCNNs require learning of many parameters from raw images; thus, having a sufficient amount of diverse images with class annotations is needed. These annotations are obtained via cumbersome, human labour which is particularly challenging for semantic segmentation since pixel-level annotations are required. In this chapter, we propose to use a combination of a virtual world to automatically generate realistic synthetic images with pixel-level annotations, and domain adaptation to transfer the models learnt to correctly operate in real scenarios. We address the question of how useful synthetic data can be for semantic segmentation – in particular, when using a DCNN paradigm. In order to answer this question we have generated a synthetic collection of diverse urban images, named SYNTHIA, with automatically generated class annotations and object identifiers. We use SYNTHIA in combination with publicly available real-world urban images with manually provided annotations. Then, we conduct experiments with DCNNs that show that combining SYNTHIA with simple domain adaptation techniques in the training stage significantly improves performance on semantic segmentation.
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor Gabriela Csurka
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes ADAS; 600.085; 600.082; 600.076; 600.118 Approved no
Call Number ADAS @ adas @ RSV2017 Serial 2882
Permanent link to this record
 

 
Author Xialei Liu; Joost Van de Weijer; Andrew Bagdanov
Title Leveraging Unlabeled Data for Crowd Counting by Learning to Rank Type Conference Article
Year 2018 Publication 31st IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal
Volume Issue Pages 7661 - 7669
Keywords (up) Task analysis; Training; Computer vision; Visualization; Estimation; Head; Context modeling
Abstract We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of
cropped images , we use the observation that any sub-image of a crowded scene image is guaranteed to contain the same number or fewer persons than the super-image. This allows us to address the problem of limited size of existing
datasets for crowd counting. We collect two crowd scene datasets from Google using keyword searches and queryby-example image retrieval, respectively. We demonstrate how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which simultaneously ranks images and estimates crowd density maps. Experiments on two of the most challenging crowd counting datasets show that our approach obtains state-ofthe-art results.
Address Salt Lake City; USA; June 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPR
Notes LAMP; 600.109; 600.106; 600.120 Approved no
Call Number Admin @ si @ LWB2018 Serial 3159
Permanent link to this record
 

 
Author Xialei Liu; Joost Van de Weijer; Andrew Bagdanov
Title Exploiting Unlabeled Data in CNNs by Self-Supervised Learning to Rank Type Journal Article
Year 2019 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI
Volume 41 Issue 8 Pages 1862-1878
Keywords (up) Task analysis;Training;Image quality;Visualization;Uncertainty;Labeling;Neural networks;Learning from rankings;image quality assessment;crowd counting;active learning
Abstract For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50 percent.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes LAMP; 600.109; 600.106; 600.120 Approved no
Call Number LWB2019 Serial 3267
Permanent link to this record
 

 
Author E. Tavalera; Mariella Dimiccoli; Marc Bolaños; Maedeh Aghaei; Petia Radeva
Title Regularized Clustering for Egocentric Video Segmentation Type Book Chapter
Year 2015 Publication Pattern Recognition and Image Analysis Abbreviated Journal
Volume Issue Pages 327-336
Keywords (up) Temporal video segmentation ; Egocentric videos ; Clustering
Abstract In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energyminimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate techniques in an energy-minimization framework that serves disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Address
Corporate Author Thesis
Publisher Springer International Publishing Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-319-19390-8 Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number Admin @ si @TDB2015a Serial 2781
Permanent link to this record
 

 
Author Estefania Talavera; Mariella Dimiccoli; Marc Bolaños; Maedeh Aghaei; Petia Radeva
Title R-clustering for egocentric video segmentation Type Conference Article
Year 2015 Publication Pattern Recognition and Image Analysis, Proceedings of 7th Iberian Conference , ibPRIA 2015 Abbreviated Journal
Volume 9117 Issue Pages 327-336
Keywords (up) Temporal video segmentation; Egocentric videos; Clustering
Abstract In this paper, we present a new method for egocentric video temporal segmentation based on integrating a statistical mean change detector and agglomerative clustering(AC) within an energy-minimization framework. Given the tendency of most AC methods to oversegment video sequences when clustering their frames, we combine the clustering with a concept drift detection technique (ADWIN) that has rigorous guarantee of performances. ADWIN serves as a statistical upper bound for the clustering-based video segmentation. We integrate both techniques in an energy-minimization framework that serves to disambiguate the decision of both techniques and to complete the segmentation taking into account the temporal continuity of video frames descriptors. We present experiments over egocentric sets of more than 13.000 images acquired with different wearable cameras, showing that our method outperforms state-of-the-art clustering methods.
Address Santiago de Compostela; España; June 2015
Corporate Author Thesis
Publisher Springer International Publishing Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-319-19389-2 Medium
Area Expedition Conference IbPRIA
Notes MILAB Approved no
Call Number Admin @ si @ TDB2015 Serial 2597
Permanent link to this record
 

 
Author Oscar Argudo; Marc Comino; Antonio Chica; Carlos Andujar; Felipe Lumbreras
Title Segmentation of aerial images for plausible detail synthesis Type Journal Article
Year 2018 Publication Computers & Graphics Abbreviated Journal CG
Volume 71 Issue Pages 23-34
Keywords (up) Terrain editing; Detail synthesis; Vegetation synthesis; Terrain rendering; Image segmentation
Abstract The visual enrichment of digital terrain models with plausible synthetic detail requires the segmentation of aerial images into a suitable collection of categories. In this paper we present a complete pipeline for segmenting high-resolution aerial images into a user-defined set of categories distinguishing e.g. terrain, sand, snow, water, and different types of vegetation. This segmentation-for-synthesis problem implies that per-pixel categories must be established according to the algorithms chosen for rendering the synthetic detail. This precludes the definition of a universal set of labels and hinders the construction of large training sets. Since artists might choose to add new categories on the fly, the whole pipeline must be robust against unbalanced datasets, and fast on both training and inference. Under these constraints, we analyze the contribution of common per-pixel descriptors, and compare the performance of state-of-the-art supervised learning algorithms. We report the findings of two user studies. The first one was conducted to analyze human accuracy when manually labeling aerial images. The second user study compares detailed terrains built using different segmentation strategies, including official land cover maps. These studies demonstrate that our approach can be used to turn digital elevation models into fully-featured, detailed terrains with minimal authoring efforts.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 0097-8493 ISBN Medium
Area Expedition Conference
Notes MSIAU; 600.086; 600.118 Approved no
Call Number Admin @ si @ ACC2018 Serial 3147
Permanent link to this record
 

 
Author David Aldavert; Marçal Rusiñol
Title Manuscript text line detection and segmentation using second-order derivatives analysis Type Conference Article
Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 293 - 298
Keywords (up) text line detection; text line segmentation; text region detection; second-order derivatives
Abstract In this paper, we explore the use of second-order derivatives to detect text lines on handwritten document images. Taking advantage that the second derivative gives a minimum response when a dark linear element over a
bright background has the same orientation as the filter, we use this operator to create a map with the local orientation and strength of putative text lines in the document. Then, we detect line segments by selecting and merging the filter responses that have a similar orientation and scale. Finally, text lines are found by merging the segments that are within the same text region. The proposed segmentation algorithm, is learning-free while showing a performance similar to the state of the art methods in publicly available datasets.
Address Viena; Austria; April 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes DAG; 600.084; 600.129; 302.065; 600.121 Approved no
Call Number Admin @ si @ AlR2018a Serial 3104
Permanent link to this record
 

 
Author David Fernandez; Josep Llados; Alicia Fornes
Title A graph-based approach for segmenting touching lines in historical handwritten documents Type Journal Article
Year 2014 Publication International Journal on Document Analysis and Recognition Abbreviated Journal IJDAR
Volume 17 Issue 3 Pages 293-312
Keywords (up) Text line segmentation; Handwritten documents; Document image processing; Historical document analysis
Abstract Text line segmentation in handwritten documents is an important task in the recognition of historical documents. Handwritten document images contain text lines with multiple orientations, touching and overlapping characters between consecutive text lines and different document structures, making line segmentation a difficult task. In this paper, we present a new approach for handwritten text line segmentation solving the problems of touching components, curvilinear text lines and horizontally overlapping components. The proposed algorithm formulates line segmentation as finding the central path in the area between two consecutive lines. This is solved as a graph traversal problem. A graph is constructed using the skeleton of the image. Then, a path-finding algorithm is used to find the optimum path between text lines. The proposed algorithm has been evaluated on a comprehensive dataset consisting of five databases: ICDAR2009, ICDAR2013, UMD, the George Washington and the Barcelona Marriages Database. The proposed method outperforms the state-of-the-art considering the different types and difficulties of the benchmarking data.
Address
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1433-2833 ISBN Medium
Area Expedition Conference
Notes DAG; 600.056; 600.061; 602.006; 600.077 Approved no
Call Number Admin @ si @ FLF2014 Serial 2459
Permanent link to this record
 

 
Author Christophe Rigaud; Dimosthenis Karatzas; Joost Van de Weijer; Jean-Christophe Burie; Jean-Marc Ogier
Title Automatic text localisation in scanned comic books Type Conference Article
Year 2013 Publication Proceedings of the International Conference on Computer Vision Theory and Applications Abbreviated Journal
Volume Issue Pages 814-819
Keywords (up) Text localization; comics; text/graphic separation; complex background; unstructured document
Abstract Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent document understanding enable direct content-based search as opposed to metadata only search (e.g. album title or author name). Few studies have been done in this direction. In this work we detail a novel approach for the automatic text localization in scanned comics book pages, an essential step towards a fully automatic comics book understanding. We focus on speech text as it is semantically important and represents the majority of the text present in comics. The approach is compared with existing methods of text localization found in the literature and results are presented.
Address Barcelona; February 2013
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISAPP
Notes DAG; CIC; 600.056 Approved no
Call Number Admin @ si @ RKW2013b Serial 2261
Permanent link to this record
 

 
Author Antonio Clavelli; Dimosthenis Karatzas; Josep Llados; Mario Ferraro; Giuseppe Boccignone
Title Towards Modelling an Attention-Based Text Localization Process Type Conference Article
Year 2013 Publication 6th Iberian Conference on Pattern Recognition and Image Analysis Abbreviated Journal
Volume 7887 Issue Pages 296-303
Keywords (up) text localization; visual attention; eye guidance
Abstract This note introduces a visual attention model of text localization in real-world scenes. The core of the model built upon the proto-object concept is discussed. It is shown how such dynamic mid-level representation of the scene can be derived in the framework of an action-perception loop engaging salience, text information value computation, and eye guidance mechanisms.
Preliminary results that compare model generated scanpaths with those eye-tracked from human subjects are presented.
Address Madeira; Portugal; June 2013
Corporate Author Thesis
Publisher Springer Berlin Heidelberg Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN 0302-9743 ISBN 978-3-642-38627-5 Medium
Area Expedition Conference IbPRIA
Notes DAG Approved no
Call Number Admin @ si @ CKL2013 Serial 2291
Permanent link to this record
 

 
Author Bojana Gajic; Eduard Vazquez; Ramon Baldrich
Title Evaluation of Deep Image Descriptors for Texture Retrieval Type Conference Article
Year 2017 Publication Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017) Abbreviated Journal
Volume Issue Pages 251-257
Keywords (up) Texture Representation; Texture Retrieval; Convolutional Neural Networks; Psychophysical Evaluation
Abstract The increasing complexity learnt in the layers of a Convolutional Neural Network has proven to be of great help for the task of classification. The topic has received great attention in recently published literature.
Nonetheless, just a handful of works study low-level representations, commonly associated with lower layers. In this paper, we explore recent findings which conclude, counterintuitively, the last layer of the VGG convolutional network is the best to describe a low-level property such as texture. To shed some light on this issue, we are proposing a psychophysical experiment to evaluate the adequacy of different layers of the VGG network for texture retrieval. Results obtained suggest that, whereas the last convolutional layer is a good choice for a specific task of classification, it might not be the best choice as a texture descriptor, showing a very poor performance on texture retrieval. Intermediate layers show the best performance, showing a good combination of basic filters, as in the primary visual cortex, and also a degree of higher level information to describe more complex textures.
Address Porto, Portugal; 27 February – 1 March 2017
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference VISIGRAPP
Notes CIC; 600.087 Approved no
Call Number Admin @ si @ Serial 3710
Permanent link to this record