|   | 
Details
   web
Records
Author Thanh Nam Le; Muhammad Muzzamil Luqman; Anjan Dutta; Pierre Heroux; Christophe Rigaud; Clement Guerin; Pasquale Foggia; Jean Christophe Burie; Jean Marc Ogier; Josep Llados; Sebastien Adam
Title Subgraph spotting in graph representations of comic book images Type Journal Article
Year 2018 Publication Pattern Recognition Letters Abbreviated Journal PRL
Volume 112 Issue Pages 118-124
Keywords Attributed graph; Region adjacency graph; Graph matching; Graph isomorphism; Subgraph isomorphism; Subgraph spotting; Graph indexing; Graph retrieval; Query by example; Dataset and comic book images
Abstract Graph-based representations are the most powerful data structures for extracting, representing and preserving the structural information of underlying data. Subgraph spotting is an interesting research problem, especially for studying and investigating the structural information based content-based image retrieval (CBIR) and query by example (QBE) in image databases. In this paper we address the problem of lack of freely available ground-truthed datasets for subgraph spotting and present a new dataset for subgraph spotting in graph representations of comic book images (SSGCI) with its ground-truth and evaluation protocol. Experimental results of two state-of-the-art methods of subgraph spotting are presented on the new SSGCI dataset.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) DAG; 600.097; 600.121 Approved no
Call Number Admin @ si @ LLD2018 Serial 3150
Permanent link to this record
 

 
Author Francisco Cruz; Oriol Ramos Terrades
Title A probabilistic framework for handwritten text line segmentation Type Miscellaneous
Year 2018 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords Document Analysis; Text Line Segmentation; EM algorithm; Probabilistic Graphical Models; Parameter Learning
Abstract We successfully combine Expectation-Maximization algorithm and variational
approaches for parameter learning and computing inference on Markov random fields. This is a general method that can be applied to many computer
vision tasks. In this paper, we apply it to handwritten text line segmentation.
We conduct several experiments that demonstrate that our method deal with
common issues of this task, such as complex document layout or non-latin
scripts. The obtained results prove that our method achieve state-of-theart performance on different benchmark datasets without any particular fine
tuning step.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) DAG; 600.097; 600.121 Approved no
Call Number Admin @ si @ CrR2018 Serial 3253
Permanent link to this record
 

 
Author Sounak Dey; Anjan Dutta; Suman Ghosh; Ernest Valveny; Josep Llados
Title Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework Type Conference Article
Year 2018 Publication 14th Asian Conference on Computer Vision Abbreviated Journal
Volume Issue Pages
Keywords
Abstract In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset.
Address Perth; Australia; December 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ACCV
Notes (up) DAG; 600.097; 600.121; 600.129 Approved no
Call Number Admin @ si @ DDG2018a Serial 3151
Permanent link to this record
 

 
Author Arnau Baro; Pau Riba; Alicia Fornes
Title A Starting Point for Handwritten Music Recognition Type Conference Article
Year 2018 Publication 1st International Workshop on Reading Music Systems Abbreviated Journal
Volume Issue Pages 5-6
Keywords Optical Music Recognition; Long Short-Term Memory; Convolutional Neural Networks; MUSCIMA++; CVCMUSCIMA
Abstract In the last years, the interest in Optical Music Recognition (OMR) has reawakened, especially since the appearance of deep learning. However, there are very few works addressing handwritten scores. In this work we describe a full OMR pipeline for handwritten music scores by using Convolutional and Recurrent Neural Networks that could serve as a baseline for the research community.
Address Paris; France; September 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference WORMS
Notes (up) DAG; 600.097; 601.302; 601.330; 600.121 Approved no
Call Number Admin @ si @ BRF2018 Serial 3223
Permanent link to this record
 

 
Author Arnau Baro; Pau Riba; Jorge Calvo-Zaragoza; Alicia Fornes
Title Optical Music Recognition by Long Short-Term Memory Networks Type Book Chapter
Year 2018 Publication Graphics Recognition. Current Trends and Evolutions Abbreviated Journal
Volume 11009 Issue Pages 81-95
Keywords Optical Music Recognition; Recurrent Neural Network; Long ShortTerm Memory
Abstract Optical Music Recognition refers to the task of transcribing the image of a music score into a machine-readable format. Many music scores are written in a single staff, and therefore, they could be treated as a sequence. Therefore, this work explores the use of Long Short-Term Memory (LSTM) Recurrent Neural Networks for reading the music score sequentially, where the LSTM helps in keeping the context. For training, we have used a synthetic dataset of more than 40000 images, labeled at primitive level. The experimental results are promising, showing the benefits of our approach.
Address
Corporate Author Thesis
Publisher Springer Place of Publication Editor A. Fornes, B. Lamiroy
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-030-02283-9 Medium
Area Expedition Conference GREC
Notes (up) DAG; 600.097; 601.302; 601.330; 600.121 Approved no
Call Number Admin @ si @ BRC2018 Serial 3227
Permanent link to this record
 

 
Author Lei Kang; Juan Ignacio Toledo; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol
Title Convolve, Attend and Spell: An Attention-based Sequence-to-Sequence Model for Handwritten Word Recognition Type Conference Article
Year 2018 Publication 40th German Conference on Pattern Recognition Abbreviated Journal
Volume Issue Pages 459-472
Keywords
Abstract This paper proposes Convolve, Attend and Spell, an attention based sequence-to-sequence model for handwritten word recognition. The proposed architecture has three main parts: an encoder, consisting of a CNN and a bi-directional GRU, an attention mechanism devoted to focus on the pertinent features and a decoder formed by a one-directional GRU, able to spell the corresponding word, character by character. Compared with the recent state-of-the-art, our model achieves competitive results on the IAM dataset without needing any pre-processing step, predefined lexicon nor language model. Code and additional results are available in https://github.com/omni-us/research-seq2seq-HTR.
Address Stuttgart; Germany; October 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference GCPR
Notes (up) DAG; 600.097; 603.057; 302.065; 601.302; 600.084; 600.121; 600.129 Approved no
Call Number Admin @ si @ KTR2018 Serial 3167
Permanent link to this record
 

 
Author Jialuo Chen; Pau Riba; Alicia Fornes; Juan Mas; Josep Llados; Joana Maria Pujadas-Mora
Title Word-Hunter: A Gamesourcing Experience to Validate the Transcription of Historical Manuscripts Type Conference Article
Year 2018 Publication 16th International Conference on Frontiers in Handwriting Recognition Abbreviated Journal
Volume Issue Pages 528-533
Keywords Crowdsourcing; Gamification; Handwritten documents; Performance evaluation
Abstract Nowadays, there are still many handwritten historical documents in archives waiting to be transcribed and indexed. Since manual transcription is tedious and time consuming, the automatic transcription seems the path to follow. However, the performance of current handwriting recognition techniques is not perfect, so a manual validation is mandatory. Crowdsourcing is a good strategy for manual validation, however it is a tedious task. In this paper we analyze experiences based in gamification
in order to propose and design a gamesourcing framework that increases the interest of users. Then, we describe and analyze our experience when validating the automatic transcription using the gamesourcing application. Moreover, thanks to the combination of clustering and handwriting recognition techniques, we can speed up the validation while maintaining the performance.
Address Niagara Falls, USA; August 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICFHR
Notes (up) DAG; 600.097; 603.057; 600.121 Approved no
Call Number Admin @ si @ CRF2018 Serial 3169
Permanent link to this record
 

 
Author Pau Riba; Andreas Fischer; Josep Llados; Alicia Fornes
Title Learning Graph Distances with Message Passing Neural Networks Type Conference Article
Year 2018 Publication 24th International Conference on Pattern Recognition Abbreviated Journal
Volume Issue Pages 2239-2244
Keywords ★Best Paper Award★
Abstract Graph representations have been widely used in pattern recognition thanks to their powerful representation formalism and rich theoretical background. A number of error-tolerant graph matching algorithms such as graph edit distance have been proposed for computing a distance between two labelled graphs. However, they typically suffer from a high
computational complexity, which makes it difficult to apply
these matching algorithms in a real scenario. In this paper, we propose an efficient graph distance based on the emerging field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure and learns a metric with a siamese network approach. The performance of the proposed graph distance is validated in two application cases, graph classification and graph retrieval of handwritten words, and shows a promising performance when compared with
(approximate) graph edit distance benchmarks.
Address Beijing; China; August 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICPR
Notes (up) DAG; 600.097; 603.057; 601.302; 600.121 Approved no
Call Number Admin @ si @ RFL2018 Serial 3168
Permanent link to this record
 

 
Author Manuel Carbonell; Mauricio Villegas; Alicia Fornes; Josep Llados
Title Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model Type Conference Article
Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal
Volume Issue Pages 399-404
Keywords Named entity recognition; Handwritten Text Recognition; neural networks
Abstract When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the
performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different
configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing.
Address Vienna; Austria; April 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference DAS
Notes (up) DAG; 600.097; 603.057; 601.311; 600.121 Approved no
Call Number Admin @ si @ CVF2018 Serial 3170
Permanent link to this record
 

 
Author Dena Bazazian
Title Fully Convolutional Networks for Text Understanding in Scene Images Type Book Whole
Year 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Text understanding in scene images has gained plenty of attention in the computer vision community and it is an important task in many applications as text carries semantically rich information about scene content and context. For instance, reading text in a scene can be applied to autonomous driving, scene understanding or assisting visually impaired people. The general aim of scene text understanding is to localize and recognize text in scene images. Text regions are first localized in the original image by a trained detector model and afterwards fed into a recognition module. The tasks of localization and recognition are highly correlated since an inaccurate localization can affect the recognition task.
The main purpose of this thesis is to devise efficient methods for scene text understanding. We investigate how the latest results on deep learning can advance text understanding pipelines. Recently, Fully Convolutional Networks (FCNs) and derived methods have achieved a significant performance on semantic segmentation and pixel level classification tasks. Therefore, we took benefit of the strengths of FCN approaches in order to detect text in natural scenes. In this thesis we have focused on two challenging tasks of scene text understanding which are Text Detection and Word Spotting. For the task of text detection, we have proposed an efficient text proposal technique in scene images. We have considered the Text Proposals method as the baseline which is an approach to reduce the search space of possible text regions in an image. In order to improve the Text Proposals method we combined it with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same level of accuracy and thus gaining a significant speed up. Our experiments demonstrate that this text proposal approach yields significantly higher recall rates than the line based text localization techniques, while also producing better-quality localization. We have also applied this technique on compressed images such as videos from wearable egocentric cameras. For the task of word spotting, we have introduced a novel mid-level word representation method. We have proposed a technique to create and exploit an intermediate representation of images based on text attributes which roughly correspond to character probability maps. Our representation extends the concept of Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks through an efficient text line proposal algorithm. To evaluate the detected text, we propose a novel line based evaluation along with the classic bounding box based approach. We test our method on incidental scene text images which comprises real-life scenarios such as urban scenes. The importance of incidental scene text images is due to the complexity of backgrounds, perspective, variety of script and language, short text and little linguistic context. All of these factors together makes the incidental scene text images challenging.
Address November 2018
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Dimosthenis Karatzas;Andrew Bagdanov
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-1-1 Medium
Area Expedition Conference
Notes (up) DAG; 600.121 Approved no
Call Number Admin @ si @ Baz2018 Serial 3220
Permanent link to this record
 

 
Author Alicia Fornes; Bart Lamiroy
Title Graphics Recognition, Current Trends and Evolutions Type Book Whole
Year 2018 Publication Graphics Recognition, Current Trends and Evolutions Abbreviated Journal
Volume 11009 Issue Pages
Keywords
Abstract This book constitutes the thoroughly refereed post-conference proceedings of the 12th International Workshop on Graphics Recognition, GREC 2017, held in Kyoto, Japan, in November 2017.
The 10 revised full papers presented were carefully reviewed and selected from 14 initial submissions. They contain both classical and emerging topics of graphics rcognition, namely analysis and detection of diagrams, search and classification, optical music recognition, interpretation of engineering drawings and maps.
Address
Corporate Author Thesis
Publisher Springer International Publishing Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN 978-3-030-02283-9 Medium
Area Expedition Conference
Notes (up) DAG; 600.121 Approved no
Call Number Admin @ si @ FoL2018 Serial 3171
Permanent link to this record
 

 
Author Suman Ghosh
Title Word Spotting and Recognition in Images from Heterogeneous Sources A Type Book Whole
Year 2018 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Text is the most common way of information sharing from ages. With recent development of personal images databases and handwritten historic manuscripts the demand for algorithms to make these databases accessible for browsing and indexing are in rise. Enabling search or understanding large collection of manuscripts or image databases needs fast and robust methods. Researchers have found different ways to represent cropped words for understanding and matching, which works well when words are already segmented. However there is no trivial way to extend these for non-segmented documents. In this thesis we explore different methods for text retrieval and recognition from unsegmented document and scene images. Two different ways of representation exist in literature, one uses a fixed length representation learned from cropped words and another a sequence of features of variable length. Throughout this thesis, we have studied both these representation for their suitability in segmentation free understanding of text. In the first part we are focused on segmentation free word spotting using a fixed length representation. We extended the use of the successful PHOC (Pyramidal Histogram of Character) representation to segmentation free retrieval. In the second part of the thesis, we explore sequence based features and finally, we propose a unified solution where the same framework can generate both kind of representations.
Address November 2018
Corporate Author Thesis Ph.D. thesis
Publisher Ediciones Graficas Rey Place of Publication Editor Ernest Valveny
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-84-948531-0-4 Medium
Area Expedition Conference
Notes (up) DAG; 600.121 Approved no
Call Number Admin @ si @ Gho2018 Serial 3217
Permanent link to this record
 

 
Author Arka Ujjal Dey; Suman Ghosh; Ernest Valveny
Title Don't only Feel Read: Using Scene text to understand advertisements Type Conference Article
Year 2018 Publication IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal
Volume Issue Pages
Keywords
Abstract We propose a framework for automated classification of Advertisement Images, using not just Visual features but also Textual cues extracted from embedded text. Our approach takes inspiration from the assumption that Ad images contain meaningful textual content, that can provide discriminative semantic interpretetion, and can thus aid in classifcation tasks. To this end, we develop a framework using off-the-shelf components, and demonstrate the effectiveness of Textual cues in semantic Classfication tasks.
Address Salt Lake City; Utah; USA; June 2018
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference CVPRW
Notes (up) DAG; 600.121; 600.129 Approved no
Call Number Admin @ si @ DGV2018 Serial 3551
Permanent link to this record
 

 
Author Mohammed Al Rawi; Dimosthenis Karatzas
Title On the Labeling Correctness in Computer Vision Datasets Type Conference Article
Year 2018 Publication Proceedings of the Workshop on Interactive Adaptive Learning, co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Image datasets have heavily been used to build computer vision systems.
These datasets are either manually or automatically labeled, which is a
problem as both labeling methods are prone to errors. To investigate this problem, we use a majority voting ensemble that combines the results from several Convolutional Neural Networks (CNNs). Majority voting ensembles not only enhance the overall performance, but can also be used to estimate the confidence level of each sample. We also examined Softmax as another form to estimate posterior probability. We have designed various experiments with a range of different ensembles built from one or different, or temporal/snapshot CNNs, which have been trained multiple times stochastically. We analyzed CIFAR10, CIFAR100, EMNIST, and SVHN datasets and we found quite a few incorrect
labels, both in the training and testing sets. We also present detailed confidence analysis on these datasets and we found that the ensemble is better than the Softmax when used estimate the per-sample confidence. This work thus proposes an approach that can be used to scrutinize and verify the labeling of computer vision datasets, which can later be applied to weakly/semi-supervised learning. We propose a measure, based on the Odds-Ratio, to quantify how many of these incorrectly classified labels are actually incorrectly labeled and how many of these are confusing. The proposed methods are easily scalable to larger datasets, like ImageNet, LSUN and SUN, as each CNN instance is trained for 60 epochs; or even faster, by implementing a temporal (snapshot) ensemble.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ECML-PKDDW
Notes (up) DAG; 600.121; 600.129 Approved no
Call Number Admin @ si @ RaK2018 Serial 3144
Permanent link to this record
 

 
Author Anguelos Nicolaou; Sounak Dey; V.Christlein; A.Maier; Dimosthenis Karatzas
Title Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings Type Conference Article
Year 2018 Publication International Workshop on Reproducible Research in Pattern Recognition Abbreviated Journal
Volume 11455 Issue Pages 71-82
Keywords
Abstract Embedding data into vector spaces is a very popular strategy of pattern recognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis of the ambiguity quantized distances introduce and provide bounds on the effect. We demonstrate that it can have a measurable effect in empirical data in state-of-the-art systems. We also approach the phenomenon from a computer security perspective and demonstrate how someone being evaluated by a third party can exploit this ambiguity and greatly outperform a random predictor without even access to the input data. We also suggest a simple solution making the performance metrics, which rely on ranking, totally deterministic and impervious to such exploits.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title LNCS
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes (up) DAG; 600.121; 600.129 Approved no
Call Number Admin @ si @ NDC2018 Serial 3178
Permanent link to this record