toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Juan Ignacio Toledo; Manuel Carbonell; Alicia Fornes; Josep Llados edit  url
openurl 
  Title Information Extraction from Historical Handwritten Document Images with a Context-aware Neural Model Type Journal Article
  Year 2019 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 86 Issue Pages 27-36  
  Keywords Document image analysis; Handwritten documents; Named entity recognition; Deep neural networks  
  Abstract Many historical manuscripts that hold trustworthy memories of the past societies contain information organized in a structured layout (e.g. census, birth or marriage records). The precious information stored in these documents cannot be effectively used nor accessed without costly annotation efforts. The transcription driven by the semantic categories of words is crucial for the subsequent access. In this paper we describe an approach to extract information from structured historical handwritten text images and build a knowledge representation for the extraction of meaning out of historical data. The method extracts information, such as named entities, without the need of an intermediate transcription step, thanks to the incorporation of context information through language models. Our system has two variants, the first one is based on bigrams, whereas the second one is based on recurrent neural networks. Concretely, our second architecture integrates a Convolutional Neural Network to model visual information from word images together with a Bidirecitonal Long Short Term Memory network to model the relation among the words. This integrated sequential approach is able to extract more information than just the semantic category (e.g. a semantic category can be associated to a person in a record). Our system is generic, it deals with out-of-vocabulary words by design, and it can be applied to structured handwritten texts from different domains. The method has been validated with the ICDAR IEHHR competition protocol, outperforming the existing approaches.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.097; 601.311; 603.057; 600.084; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ TCF2019 Serial 3166  
Permanent link to this record
 

 
Author Lei Kang; Juan Ignacio Toledo; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol edit   pdf
url  openurl
  Title Convolve, Attend and Spell: An Attention-based Sequence-to-Sequence Model for Handwritten Word Recognition Type Conference Article
  Year 2018 Publication 40th German Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 459-472  
  Keywords  
  Abstract This paper proposes Convolve, Attend and Spell, an attention based sequence-to-sequence model for handwritten word recognition. The proposed architecture has three main parts: an encoder, consisting of a CNN and a bi-directional GRU, an attention mechanism devoted to focus on the pertinent features and a decoder formed by a one-directional GRU, able to spell the corresponding word, character by character. Compared with the recent state-of-the-art, our model achieves competitive results on the IAM dataset without needing any pre-processing step, predefined lexicon nor language model. Code and additional results are available in https://github.com/omni-us/research-seq2seq-HTR.  
  Address Stuttgart; Germany; October 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference GCPR  
  Notes DAG; 600.097; 603.057; 302.065; 601.302; 600.084; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ KTR2018 Serial 3167  
Permanent link to this record
 

 
Author Pau Riba; Andreas Fischer; Josep Llados; Alicia Fornes edit   pdf
doi  openurl
  Title Learning Graph Distances with Message Passing Neural Networks Type Conference Article
  Year 2018 Publication 24th International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 2239-2244  
  Keywords ★Best Paper Award★  
  Abstract Graph representations have been widely used in pattern recognition thanks to their powerful representation formalism and rich theoretical background. A number of error-tolerant graph matching algorithms such as graph edit distance have been proposed for computing a distance between two labelled graphs. However, they typically suffer from a high
computational complexity, which makes it difficult to apply
these matching algorithms in a real scenario. In this paper, we propose an efficient graph distance based on the emerging field of geometric deep learning. Our method employs a message passing neural network to capture the graph structure and learns a metric with a siamese network approach. The performance of the proposed graph distance is validated in two application cases, graph classification and graph retrieval of handwritten words, and shows a promising performance when compared with
(approximate) graph edit distance benchmarks.
 
  Address Beijing; China; August 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPR  
  Notes DAG; 600.097; 603.057; 601.302; 600.121 Approved no  
  Call Number Admin @ si @ RFL2018 Serial 3168  
Permanent link to this record
 

 
Author Jialuo Chen; Pau Riba; Alicia Fornes; Juan Mas; Josep Llados; Joana Maria Pujadas-Mora edit   pdf
doi  openurl
  Title Word-Hunter: A Gamesourcing Experience to Validate the Transcription of Historical Manuscripts Type Conference Article
  Year 2018 Publication 16th International Conference on Frontiers in Handwriting Recognition Abbreviated Journal  
  Volume Issue Pages 528-533  
  Keywords Crowdsourcing; Gamification; Handwritten documents; Performance evaluation  
  Abstract Nowadays, there are still many handwritten historical documents in archives waiting to be transcribed and indexed. Since manual transcription is tedious and time consuming, the automatic transcription seems the path to follow. However, the performance of current handwriting recognition techniques is not perfect, so a manual validation is mandatory. Crowdsourcing is a good strategy for manual validation, however it is a tedious task. In this paper we analyze experiences based in gamification
in order to propose and design a gamesourcing framework that increases the interest of users. Then, we describe and analyze our experience when validating the automatic transcription using the gamesourcing application. Moreover, thanks to the combination of clustering and handwriting recognition techniques, we can speed up the validation while maintaining the performance.
 
  Address Niagara Falls, USA; August 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICFHR  
  Notes DAG; 600.097; 603.057; 600.121 Approved no  
  Call Number Admin @ si @ CRF2018 Serial 3169  
Permanent link to this record
 

 
Author Manuel Carbonell; Mauricio Villegas; Alicia Fornes; Josep Llados edit   pdf
openurl 
  Title Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 399-404  
  Keywords Named entity recognition; Handwritten Text Recognition; neural networks  
  Abstract When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the
performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different
configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing.
 
  Address Vienna; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.097; 603.057; 601.311; 600.121 Approved no  
  Call Number Admin @ si @ CVF2018 Serial 3170  
Permanent link to this record
 

 
Author Katerine Diaz; Jesus Martinez del Rincon; Marçal Rusiñol; Aura Hernandez-Sabate edit   pdf
doi  openurl
  Title Feature Extraction by Using Dual-Generalized Discriminative Common Vectors Type Journal Article
  Year 2019 Publication Journal of Mathematical Imaging and Vision Abbreviated Journal JMIV  
  Volume 61 Issue 3 Pages 331-351  
  Keywords Online feature extraction; Generalized discriminative common vectors; Dual learning; Incremental learning; Decremental learning  
  Abstract In this paper, a dual online subspace-based learning method called dual-generalized discriminative common vectors (Dual-GDCV) is presented. The method extends incremental GDCV by exploiting simultaneously both the concepts of incremental and decremental learning for supervised feature extraction and classification. Our methodology is able to update the feature representation space without recalculating the full projection or accessing the previously processed training data. It allows both adding information and removing unnecessary data from a knowledge base in an efficient way, while retaining the previously acquired knowledge. The proposed method has been theoretically proved and empirically validated in six standard face recognition and classification datasets, under two scenarios: (1) removing and adding samples of existent classes, and (2) removing and adding new classes to a classification problem. Results show a considerable computational gain without compromising the accuracy of the model in comparison with both batch methodologies and other state-of-art adaptive methods.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; ADAS; 600.084; 600.118; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ DRR2019 Serial 3172  
Permanent link to this record
 

 
Author Y. Patel; Lluis Gomez; Raul Gomez; Marçal Rusiñol; Dimosthenis Karatzas; C.V. Jawahar edit  openurl
  Title TextTopicNet-Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces Type Miscellaneous
  Year 2018 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such datasets requires a tremendous amount of human effort and annotations are limited to popular set of classes. As an alternative, learning visual features by designing auxiliary tasks which make use of freely available self-supervision has become increasingly popular in the computer vision community.
In this paper, we put forward an idea to take advantage of multi-modal context to provide self-supervision for the training of computer vision algorithms. We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration. More specifically we use popular text embedding techniques to provide the self-supervision for the training of deep CNN.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.084; 601.338; 600.121 Approved no  
  Call Number Admin @ si @ PGG2018 Serial 3177  
Permanent link to this record
 

 
Author Dena Bazazian; Dimosthenis Karatzas; Andrew Bagdanov edit   pdf
doi  openurl
  Title Word Spotting in Scene Images based on Character Recognition Type Conference Article
  Year 2018 Publication IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal  
  Volume Issue Pages 1872-1874  
  Keywords  
  Abstract In this paper we address the problem of unconstrained Word Spotting in scene images. We train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach and, via a rectangle classifier, detect the most likely rectangle for each query word based on the character attribute maps. We evaluate the proposed method on ICDAR2015 and show that it is capable of identifying and recognizing query words in natural scene images.  
  Address Salt Lake City; USA; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CVPRW  
  Notes DAG; 600.129; 600.121 Approved no  
  Call Number BKB2018a Serial 3179  
Permanent link to this record
 

 
Author Adrien Gaidon; Antonio Lopez; Florent Perronnin edit  url
openurl 
  Title The Reasonable Effectiveness of Synthetic Visual Data Type Journal Article
  Year 2018 Publication International Journal of Computer Vision Abbreviated Journal IJCV  
  Volume 126 Issue 9 Pages 899–901  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ADAS; 600.118 Approved no  
  Call Number Admin @ si @ GLP2018 Serial 3180  
Permanent link to this record
 

 
Author Zhijie Fang; Antonio Lopez edit   pdf
url  doi
openurl 
  Title Is the Pedestrian going to Cross? Answering by 2D Pose Estimation Type Conference Article
  Year 2018 Publication IEEE Intelligent Vehicles Symposium Abbreviated Journal  
  Volume Issue Pages 1271 - 1276  
  Keywords  
  Abstract Our recent work suggests that, thanks to nowadays powerful CNNs, image-based 2D pose estimation is a promising cue for determining pedestrian intentions such as crossing the road in the path of the ego-vehicle, stopping before entering the road, and starting to walk or bending towards the road. This statement is based on the results obtained on non-naturalistic sequences (Daimler dataset), i.e. in sequences choreographed specifically for performing the study. Fortunately, a new publicly available dataset (JAAD) has appeared recently to allow developing methods for detecting pedestrian intentions in naturalistic driving conditions; more specifically, for addressing the relevant question is the pedestrian going to cross? Accordingly, in this paper we use JAAD to assess the usefulness of 2D pose estimation for answering such a question. We combine CNN-based pedestrian detection, tracking and pose estimation to predict the crossing action from monocular images. Overall, the proposed pipeline provides new state-ofthe-art results.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IV  
  Notes ADAS; 600.124; 600.116; 600.118 Approved no  
  Call Number Admin @ si @ FaL2018 Serial 3181  
Permanent link to this record
 

 
Author Jiaolong Xu; Peng Wang; Heng Yang; Antonio Lopez edit   pdf
url  doi
openurl 
  Title Training a Binary Weight Object Detector by Knowledge Transfer for Autonomous Driving Type Conference Article
  Year 2019 Publication IEEE International Conference on Robotics and Automation Abbreviated Journal  
  Volume Issue Pages 2379-2384  
  Keywords  
  Abstract Autonomous driving has harsh requirements of small model size and energy efficiency, in order to enable the embedded system to achieve real-time on-board object detection. Recent deep convolutional neural network based object detectors have achieved state-of-the-art accuracy. However, such models are trained with numerous parameters and their high computational costs and large storage prohibit the deployment to memory and computation resource limited systems. Low-precision neural networks are popular techniques for reducing the computation requirements and memory footprint. Among them, binary weight neural network (BWN) is the extreme case which quantizes the float-point into just bit. BWNs are difficult to train and suffer from accuracy deprecation due to the extreme low-bit representation. To address this problem, we propose a knowledge transfer (KT) method to aid the training of BWN using a full-precision teacher network. We built DarkNet-and MobileNet-based binary weight YOLO-v2 detectors and conduct experiments on KITTI benchmark for car, pedestrian and cyclist detection. The experimental results show that the proposed method maintains high detection accuracy while reducing the model size of DarkNet-YOLO from 257 MB to 8.8 MB and MobileNet-YOLO from 193 MB to 7.9 MB.  
  Address Montreal; Canada; May 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICRA  
  Notes ADAS; 600.124; 600.116; 600.118 Approved no  
  Call Number Admin @ si @ XWY2018 Serial 3182  
Permanent link to this record
 

 
Author Akhil Gurram; Onay Urfalioglu; Ibrahim Halfaoui; Fahd Bouzaraa; Antonio Lopez edit   pdf
doi  openurl
  Title Monocular Depth Estimation by Learning from Heterogeneous Datasets Type Conference Article
  Year 2018 Publication IEEE Intelligent Vehicles Symposium Abbreviated Journal  
  Volume Issue Pages 2176 - 2181  
  Keywords  
  Abstract Depth estimation provides essential information to perform autonomous driving and driver assistance. Especially, Monocular Depth Estimation is interesting from a practical point of view, since using a single camera is cheaper than many other options and avoids the need for continuous calibration strategies as required by stereo-vision approaches. State-of-the-art methods for Monocular Depth Estimation are based on Convolutional Neural Networks (CNNs). A promising line of work consists of introducing additional semantic information about the traffic scene when training CNNs for depth estimation. In practice, this means that the depth data used for CNN training is complemented with images having pixel-wise semantic labels, which usually are difficult to annotate (eg crowded urban images). Moreover, so far it is common practice to assume that the same raw training data is associated with both types of ground truth, ie, depth and semantic labels. The main contribution of this paper is to show that this hard constraint can be circumvented, ie, that we can train CNNs for depth estimation by leveraging the depth and semantic information coming from heterogeneous datasets. In order to illustrate the benefits of our approach, we combine KITTI depth and Cityscapes semantic segmentation datasets, outperforming state-of-the-art results on Monocular Depth Estimation.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IV  
  Notes ADAS; 600.124; 600.116; 600.118 Approved no  
  Call Number Admin @ si @ GUH2018 Serial 3183  
Permanent link to this record
 

 
Author Alejandro Cartas; Estefania Talavera; Petia Radeva; Mariella Dimiccoli edit  openurl
  Title On the Role of Event Boundaries in Egocentric Activity Recognition from Photostreams Type Miscellaneous
  Year 2018 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Event boundaries play a crucial role as a pre-processing step for detection, localization, and recognition tasks of human activities in videos. Typically, although their intrinsic subjectiveness, temporal bounds are provided manually as input for training action recognition algorithms. However, their role for activity recognition in the domain of egocentric photostreams has been so far neglected. In this paper, we provide insights of how automatically computed boundaries can impact activity recognition results in the emerging domain of egocentric photostreams. Furthermore, we collected a new annotated dataset acquired by 15 people by a wearable photo-camera and we used it to show the generalization capabilities of several deep learning based architectures to unseen users.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ CTR2018 Serial 3184  
Permanent link to this record
 

 
Author Alejandro Cartas; Juan Marin; Petia Radeva; Mariella Dimiccoli edit   pdf
url  openurl
  Title Batch-based activity recognition from egocentric photo-streams revisited Type Journal Article
  Year 2018 Publication Pattern Analysis and Applications Abbreviated Journal PAA  
  Volume 21 Issue 4 Pages 953–965  
  Keywords Egocentric vision; Lifelogging; Activity recognition; Deep learning; Recurrent neural networks  
  Abstract Wearable cameras can gather large amounts of image data that provide rich visual information about the daily activities of the wearer. Motivated by the large number of health applications that could be enabled by the automatic recognition of daily activities, such as lifestyle characterization for habit improvement, context-aware personal assistance and tele-rehabilitation services, we propose a system to classify 21 daily activities from photo-streams acquired by a wearable photo-camera. Our approach combines the advantages of a late fusion ensemble strategy relying on convolutional neural networks at image level with the ability of recurrent neural networks to account for the temporal evolution of high-level features in photo-streams without relying on event boundaries. The proposed batch-based approach achieved an overall accuracy of 89.85%, outperforming state-of-the-art end-to-end methodologies. These results were achieved on a dataset consists of 44,902 egocentric pictures from three persons captured during 26 days in average.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ CMR2018 Serial 3186  
Permanent link to this record
 

 
Author Mariella Dimiccoli; Cathal Gurrin; David J. Crandall; Xavier Giro; Petia Radeva edit  url
openurl 
  Title Introduction to the special issue: Egocentric Vision and Lifelogging Type Journal Article
  Year 2018 Publication Journal of Visual Communication and Image Representation Abbreviated Journal JVCIR  
  Volume 55 Issue Pages 352-353  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title (down)  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB; no proj Approved no  
  Call Number Admin @ si @ DGC2018 Serial 3187  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: