toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links (down)
Author Raul Gomez; Ali Furkan Biten; Lluis Gomez; Jaume Gibert; Marçal Rusiñol; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Selective Style Transfer for Text Type Conference Article
  Year 2019 Publication 15th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 805-812  
  Keywords transfer; text style transfer; data augmentation; scene text detection  
  Abstract This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross-modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means
transferring style to only desired image pixels, are proposed. Finally, scene text selective style transfer is evaluated as a data augmentation technique to expand scene text detection datasets, resulting in a boost of text detectors performance. Our implementation of the described models is publicly available.
 
  Address Sydney; Australia; September 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.129; 600.135; 601.338; 601.310; 600.121 Approved no  
  Call Number GBG2019 Serial 3265  
Permanent link to this record
 

 
Author Helena Muñoz; Fernando Vilariño; Dimosthenis Karatzas edit  url
doi  openurl
  Title Eye-Movements During Information Extraction from Administrative Documents Type Conference Article
  Year 2019 Publication International Conference on Document Analysis and Recognition Workshops Abbreviated Journal  
  Volume Issue Pages 6-9  
  Keywords  
  Abstract A key aspect of digital mailroom processes is the extraction of relevant information from administrative documents. More often than not, the extraction process cannot be fully automated, and there is instead an important amount of manual intervention. In this work we study the human process of information extraction from invoice document images. We explore whether the gaze of human annotators during an manual information extraction process could be exploited towards reducing the manual effort and automating the process. To this end, we perform an eye-tracking experiment replicating real-life interfaces for information extraction. Through this pilot study we demonstrate that relevant areas in the document can be identified reliably through automatic fixation classification, and the obtained models generalize well to new subjects. Our findings indicate that it is in principle possible to integrate the human in the document image analysis loop, making use of the scanpath to automate the extraction process or verify extracted information.  
  Address Sydney; Australia; September 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDARW  
  Notes DAG; 600.140; 600.121; 600.129;SIAI Approved no  
  Call Number Admin @ si @ MVK2019 Serial 3336  
Permanent link to this record
 

 
Author Lluis Gomez; Marçal Rusiñol; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Cutting Sayre's Knot: Reading Scene Text without Segmentation. Application to Utility Meters Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 97-102  
  Keywords Robust Reading; End-to-end Systems; CNN; Utility Meters  
  Abstract In this paper we present a segmentation-free system for reading text in natural scenes. A CNN architecture is trained in an end-to-end manner, and is able to directly output readings without any explicit text localization step. In order to validate our proposal, we focus on the specific case of reading utility meters. We present our results in a large dataset of images acquired by different users and devices, so text appears in any location, with different sizes, fonts and lengths, and the images present several distortions such as
dirt, illumination highlights or blur.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ GRK2018 Serial 3102  
Permanent link to this record
 

 
Author Dimosthenis Karatzas; Lluis Gomez; Marçal Rusiñol; Anguelos Nicolaou edit   pdf
url  openurl
  Title The Robust Reading Competition Annotation and Evaluation Platform Type Conference Article
  Year 2018 Publication 13th IAPR International Workshop on Document Analysis Systems Abbreviated Journal  
  Volume Issue Pages 61-66  
  Keywords  
  Abstract The ICDAR Robust Reading Competition (RRC), initiated in 2003 and reestablished in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous
effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the
Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation of data, and to provide online and offline performance evaluation and analysis services.
 
  Address Viena; Austria; April 2018  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference DAS  
  Notes DAG; 600.084; 600.121 Approved no  
  Call Number KGR2018 Serial 3103  
Permanent link to this record
 

 
Author Lasse Martensson; Ekta Vats; Anders Hast; Alicia Fornes edit  url
openurl 
  Title In Search of the Scribe: Letter Spotting as a Tool for Identifying Scribes in Large Handwritten Text Corpora Type Journal
  Year 2019 Publication Journal for Information Technology Studies as a Human Science Abbreviated Journal HUMAN IT  
  Volume 14 Issue 2 Pages 95-120  
  Keywords Scribal attribution/ writer identification; digital palaeography; word spotting; mediaeval charters; mediaeval manuscripts  
  Abstract In this article, a form of the so-called word spotting-method is used on a large set of handwritten documents in order to identify those that contain script of similar execution. The point of departure for the investigation is the mediaeval Swedish manuscript Cod. Holm. D 3. The main scribe of this manuscript has yet not been identified in other documents. The current attempt aims at localising other documents that display a large degree of similarity in the characteristics of the script, these being possible candidates for being executed by the same hand. For this purpose, the method of word spotting has been employed, focusing on individual letters, and therefore the process is referred to as letter spotting in the article. In this process, a set of ‘g’:s, ‘h’:s and ‘k’:s have been selected as templates, and then a search has been made for close matches among the mediaeval Swedish charters. The search resulted in a number of charters that displayed great similarities with the manuscript D 3. The used letter spotting method thus proofed to be a very efficient sorting tool localising similar script samples.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.097; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ MVH2019 Serial 3234  
Permanent link to this record
 

 
Author Youssef El Rhabi; Simon Loic; Brun Luc edit   pdf
url  openurl
  Title Estimation de la pose d’une caméra à partir d’un flux vidéo en s’approchant du temps réel Type Conference Article
  Year 2015 Publication 15ème édition d'ORASIS, journées francophones des jeunes chercheurs en vision par ordinateur ORASIS2015 Abbreviated Journal  
  Volume Issue Pages  
  Keywords Augmented Reality; SFM; SLAM; real time pose computation; 2D/3D registration  
  Abstract Finding a way to estimate quickly and robustly the pose of an image is essential in augmented reality. Here we will discuss the approach we chose in order to get closer to real time by using SIFT points [4]. We propose a method based on filtering both SIFT points and images on which to focus on. Hence we will focus on relevant data.  
  Address Amiens; France; June 2015  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ORASIS  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ RLL2015 Serial 2626  
Permanent link to this record
 

 
Author Olivier Lefebvre; Pau Riba; Charles Fournier; Alicia Fornes; Josep Llados; Rejean Plamondon; Jules Gagnon-Marchand edit   pdf
url  openurl
  Title Monitoring neuromotricity on-line: a cloud computing approach Type Conference Article
  Year 2015 Publication 17th Conference of the International Graphonomics Society IGS2015 Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract The goal of our experiment is to develop a useful and accessible tool that can be used to evaluate a patient's health by analyzing handwritten strokes. We use a cloud computing approach to analyze stroke data sampled on a commercial tablet working on the Android platform and a distant server to perform complex calculations using the Delta and Sigma lognormal algorithms. A Google Drive account is used to store the data and to ease the development of the project. The communication between the tablet, the cloud and the server is encrypted to ensure biomedical information confidentiality. Highly parameterized biomedical tests are implemented on the tablet as well as a free drawing test to evaluate the validity of the data acquired by the first test compared to the second one. A blurred shape model descriptor pattern recognition algorithm is used to classify the data obtained by the free drawing test. The functions presented in this paper are still currently under development and other improvements are needed before launching the application in the public domain.  
  Address Pointe-à-Pitre; Guadeloupe; June 2015  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IGS  
  Notes DAG; 600.077 Approved no  
  Call Number Admin @ si @ LRF2015 Serial 2617  
Permanent link to this record
 

 
Author Giacomo Magnifico; Beata Megyesi; Mohamed Ali Souibgui; Jialuo Chen; Alicia Fornes edit   pdf
url  openurl
  Title Lost in Transcription of Graphic Signs in Ciphers Type Conference Article
  Year 2022 Publication International Conference on Historical Cryptology (HistoCrypt 2022) Abbreviated Journal  
  Volume Issue Pages 153-158  
  Keywords transcription of ciphers; hand-written text recognition of symbols; graphic signs  
  Abstract Hand-written Text Recognition techniques with the aim to automatically identify and transcribe hand-written text have been applied to historical sources including ciphers. In this paper, we compare the performance of two machine learning architectures, an unsupervised method based on clustering and a deep learning method with few-shot learning. Both models are tested on seen and unseen data from historical ciphers with different symbol sets consisting of various types of graphic signs. We compare the models and highlight their differences in performance, with their advantages and shortcomings.  
  Address Amsterdam, Netherlands, June 20-22, 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference HystoCrypt  
  Notes DAG; 600.121; 600.162; 602.230; 600.140 Approved no  
  Call Number Admin @ si @ MBS2022 Serial 3731  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas edit  url
openurl 
  Title Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement Type Conference Article
  Year 2023 Publication Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume 37 Issue 2 Pages  
  Keywords Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning  
  Abstract In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference AAAI  
  Notes DAG Approved no  
  Call Number Admin @ si @ SBM2023 Serial 3848  
Permanent link to this record
 

 
Author Khanh Nguyen; Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas edit  url
openurl 
  Title Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia Type Conference Article
  Year 2023 Publication Proceedings of the 37th AAAI Conference on Artificial Intelligence Abbreviated Journal  
  Volume 37 Issue 2 Pages 1940-1948  
  Keywords  
  Abstract Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information given, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions to produce contextualized captions. The same Wikimedia image can be used to illustrate different articles, and the produced caption needs to be adapted to the specific context allowing us to explore the limits of the model to adjust captions to different contextual information. Dealing with out-of-dictionary words and Named Entities is a challenging task in this domain. To address this, we propose a pre-training objective, Masked Named Entity Modeling (MNEM), and show that this pretext task results to significantly improved models. Furthermore, we verify that a model pre-trained in Wikipedia generalizes well to News Captioning datasets. We further define two different test splits according to the difficulty of the captioning task. We offer insights on the role and the importance of each modality and highlight the limitations of our model.  
  Address Washington; USA; February 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference AAAI  
  Notes DAG Approved no  
  Call Number Admin @ si @ NBM2023 Serial 3860  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: