toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author V. Poulain d'Andecy; Emmanuel Hartmann; Marçal Rusiñol edit   pdf
doi  openurl
  Title Field Extraction by hybrid incremental and a-priori structural templates Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 251 - 256  
  Keywords Layout Analysis; information extraction; incremental learning  
  Abstract In this paper, we present an incremental framework for extracting information fields from administrative documents. First, we demonstrate some limits of the existing state-of-the-art methods such as the delay of the system efficiency. This is a concern in industrial context when we have only few samples of each document class. Based on this analysis, we propose a hybrid system combining incremental learning by means of itf-df statistics and a-priori generic
models. We report in the experimental section our results obtained with a dataset of real invoices.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) DAS  
  Notes DAG; 600.084; 600.129; 600.121 Approved no  
  Call Number Admin @ si @ PHR2018 Serial 3106  
Permanent link to this record
 

 
Author Manuel Carbonell; Mauricio Villegas; Alicia Fornes; Josep Llados edit   pdf
openurl 
  Title Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 399-404  
  Keywords Named entity recognition; Handwritten Text Recognition; neural networks  
  Abstract When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the
performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different
configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing.
 
  Address Vienna; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) DAS  
  Notes DAG; 600.097; 603.057; 601.311; 600.121 Approved no  
  Call Number Admin @ si @ CVF2018 Serial 3170  
Permanent link to this record
 

 
Author Adria Molina; Lluis Gomez; Oriol Ramos Terrades; Josep Llados edit   pdf
doi  openurl
  Title A Generic Image Retrieval Method for Date Estimation of Historical Document Collections Type Conference Article
  Year 2022 Publication Document Analysis Systems.15th IAPR International Workshop, (DAS2022) Abbreviated Journal  
  Volume 13237 Issue Pages 583–597  
  Keywords Date estimation; Document retrieval; Image retrieval; Ranking loss; Smooth-nDCG  
  Abstract Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images.  
  Address La Rochelle, France; May 22–25, 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) DAS  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ MGR2022 Serial 3694  
Permanent link to this record
 

 
Author Josep Brugues Pujolras; Lluis Gomez; Dimosthenis Karatzas edit   pdf
doi  openurl
  Title A Multilingual Approach to Scene Text Visual Question Answering Type Conference Article
  Year 2022 Publication Document Analysis Systems.15th IAPR International Workshop, (DAS2022) Abbreviated Journal  
  Volume Issue Pages 65-79  
  Keywords Scene text; Visual question answering; Multilingual word embeddings; Vision and language; Deep learning  
  Abstract Scene Text Visual Question Answering (ST-VQA) has recently emerged as a hot research topic in Computer Vision. Current ST-VQA models have a big potential for many types of applications but lack the ability to perform well on more than one language at a time due to the lack of multilingual data, as well as the use of monolingual word embeddings for training. In this work, we explore the possibility to obtain bilingual and multilingual VQA models. In that regard, we use an already established VQA model that uses monolingual word embeddings as part of its pipeline and substitute them by FastText and BPEmb multilingual word embeddings that have been aligned to English. Our experiments demonstrate that it is possible to obtain bilingual and multilingual VQA models with a minimal loss in performance in languages not used during training, as well as a multilingual model trained in multiple languages that match the performance of the respective monolingual baselines.  
  Address La Rochelle, France; May 22–25, 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) DAS  
  Notes DAG; 611.004; 600.155; 601.002 Approved no  
  Call Number Admin @ si @ BGK2022b Serial 3695  
Permanent link to this record
 

 
Author Sergi Garcia Bordils; George Tom; Sangeeth Reddy; Minesh Mathew; Marçal Rusiñol; C.V. Jawahar; Dimosthenis Karatzas edit   pdf
url  doi
isbn  openurl
  Title Read While You Drive-Multilingual Text Tracking on the Road Type Conference Article
  Year 2022 Publication 15th IAPR International workshop on document analysis systems Abbreviated Journal  
  Volume 13237 Issue Pages 756–770  
  Keywords  
  Abstract Visual data obtained during driving scenarios usually contain large amounts of text that conveys semantic information necessary to analyse the urban environment and is integral to the traffic control plan. Yet, research on autonomous driving or driver assistance systems typically ignores this information. To advance research in this direction, we present RoadText-3K, a large driving video dataset with fully annotated text. RoadText-3K is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions and multiple languages and scripts. We offer a comprehensive analysis of tracking by detection and detection by tracking methods exploring the limits of state-of-the-art text detection. Finally, we propose a new end-to-end trainable tracking model that yields state-of-the-art results on this challenging dataset. Our experiments demonstrate the complexity and variability of RoadText-3K and establish a new, realistic benchmark for scene text tracking in the wild.  
  Address La Rochelle; France; May 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-031-06554-5 Medium  
  Area Expedition Conference (down) DAS  
  Notes DAG; 600.155; 611.022; 611.004 Approved no  
  Call Number Admin @ si @ GTR2022 Serial 3783  
Permanent link to this record
 

 
Author Arka Ujjal Dey; Suman Ghosh; Ernest Valveny edit   pdf
openurl 
  Title Don't only Feel Read: Using Scene text to understand advertisements Type Conference Article
  Year 2018 Publication IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract We propose a framework for automated classification of Advertisement Images, using not just Visual features but also Textual cues extracted from embedded text. Our approach takes inspiration from the assumption that Ad images contain meaningful textual content, that can provide discriminative semantic interpretetion, and can thus aid in classifcation tasks. To this end, we develop a framework using off-the-shelf components, and demonstrate the effectiveness of Textual cues in semantic Classfication tasks.  
  Address Salt Lake City; Utah; USA; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) CVPRW  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ DGV2018 Serial 3551  
Permanent link to this record
 

 
Author Dena Bazazian; Dimosthenis Karatzas; Andrew Bagdanov edit   pdf
doi  openurl
  Title Word Spotting in Scene Images based on Character Recognition Type Conference Article
  Year 2018 Publication IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Abbreviated Journal  
  Volume Issue Pages 1872-1874  
  Keywords  
  Abstract In this paper we address the problem of unconstrained Word Spotting in scene images. We train a Fully Convolutional Network to produce heatmaps of all the character classes. Then, we employ the Text Proposals approach and, via a rectangle classifier, detect the most likely rectangle for each query word based on the character attribute maps. We evaluate the proposed method on ICDAR2015 and show that it is capable of identifying and recognizing query words in natural scene images.  
  Address Salt Lake City; USA; June 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) CVPRW  
  Notes DAG; 600.129; 600.121 Approved no  
  Call Number BKB2018a Serial 3179  
Permanent link to this record
 

 
Author Anjan Dutta; Zeynep Akata edit   pdf
url  doi
openurl 
  Title Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval Type Conference Article
  Year 2019 Publication 32nd IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 5089-5098  
  Keywords  
  Abstract Zero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.  
  Address Long beach; California; USA; June 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) CVPR  
  Notes DAG; 600.141; 600.121 Approved no  
  Call Number Admin @ si @ DuA2019 Serial 3268  
Permanent link to this record
 

 
Author Albert Gordo; Florent Perronnin edit  doi
isbn  openurl
  Title Asymmetric Distances for Binary Embeddings Type Conference Article
  Year 2011 Publication IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 729 - 736  
  Keywords  
  Abstract In large-scale query-by-example retrieval, embedding image signatures in a binary space offers two benefits: data compression and search efficiency. While most embedding algorithms binarize both query and database signatures, it has been noted that this is not strictly a requirement. Indeed, asymmetric schemes which binarize the database signatures but not the query still enjoy the same two benefits but may provide superior accuracy. In this work, we propose two general asymmetric distances which are applicable to a wide variety of embedding techniques including Locality Sensitive Hashing (LSH), Locality Sensitive Binary Codes (LSBC), Spectral Hashing (SH) and Semi-Supervised Hashing (SSH). We experiment on four public benchmarks containing up to 1M images and show that the proposed asymmetric distances consistently lead to large improvements over the symmetric Hamming distance for all binary embedding techniques. We also propose a novel simple binary embedding technique – PCA Embedding (PCAE) – which is shown to yield competitive results with respect to more complex algorithms such as SH and SSH.  
  Address Providence, RI  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-4577-0394-2 Medium  
  Area Expedition Conference (down) CVPR  
  Notes DAG Approved no  
  Call Number Admin @ si @ GoP2011; IAM @ iam @ GoP2011 Serial 1817  
Permanent link to this record
 

 
Author Albert Gordo; Jose Antonio Rodriguez; Florent Perronnin; Ernest Valveny edit   pdf
doi  isbn
openurl 
  Title Leveraging category-level labels for instance-level image retrieval Type Conference Article
  Year 2012 Publication 25th IEEE Conference on Computer Vision and Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 3045-3052  
  Keywords  
  Abstract In this article, we focus on the problem of large-scale instance-level image retrieval. For efficiency reasons, it is common to represent an image by a fixed-length descriptor which is subsequently encoded into a small number of bits. We note that most encoding techniques include an unsupervised dimensionality reduction step. Our goal in this work is to learn a better subspace in a supervised manner. We especially raise the following question: “can category-level labels be used to learn such a subspace?” To answer this question, we experiment with four learning techniques: the first one is based on a metric learning framework, the second one on attribute representations, the third one on Canonical Correlation Analysis (CCA) and the fourth one on Joint Subspace and Classifier Learning (JSCL). While the first three approaches have been applied in the past to the image retrieval problem, we believe we are the first to show the usefulness of JSCL in this context. In our experiments, we use ImageNet as a source of category-level labels and report retrieval results on two standard dataseis: INRIA Holidays and the University of Kentucky benchmark. Our experimental study shows that metric learning and attributes do not lead to any significant improvement in retrieval accuracy, as opposed to CCA and JSCL. As an example, we report on Holidays an increase in accuracy from 39.3% to 48.6% with 32-dimensional representations. Overall JSCL is shown to yield the best results.  
  Address Providence, Rhode Island  
  Corporate Author Thesis  
  Publisher IEEE Xplore Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1063-6919 ISBN 978-1-4673-1226-4 Medium  
  Area Expedition Conference (down) CVPR  
  Notes DAG Approved no  
  Call Number Admin @ si @ GRP2012 Serial 2050  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: