toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Klara Janousckova; Jiri Matas; Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Text Recognition – Real World Data and Where to Find Them Type Conference Article
  Year 2020 Publication 25th International Conference on Pattern Recognition Abbreviated Journal  
  Volume Issue Pages 4489-4496  
  Keywords  
  Abstract We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The method includes matching of imprecise transcriptions to weak annotations and an edit distance guided neighbourhood search. It produces nearly error-free, localised instances of scene text, which we treat as “pseudo ground truth” (PGT). The method is applied to two weakly-annotated datasets. Training with the extracted PGT consistently improves the accuracy of a state of the art recognition model, by 3.7% on average, across different benchmark datasets (image domains) and 24.5% on one of the weakly annotated datasets 1 1 Acknowledgements. The authors were supported by Czech Technical University student grant SGS20/171/0HK3/3TJ13, the MEYS VVV project CZ.02.1.01/0.010.0J16 019/0000765 Research Center for Informatics, the Spanish Research project TIN2017-89779-P and the CERCA Programme / Generalitat de Catalunya.  
  Address (up) Virtual; January 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICPR  
  Notes DAG; 600.121; 600.129 Approved no  
  Call Number Admin @ si @ JMG2020 Serial 3557  
Permanent link to this record
 

 
Author Mohamed Ali Souibgui; Ali Furkan Biten; Sounak Dey; Alicia Fornes; Yousri Kessentini; Lluis Gomez; Dimosthenis Karatzas; Josep Llados edit   pdf
url  doi
openurl 
  Title One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition Type Conference Article
  Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages  
  Keywords Document Analysis  
  Abstract Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). This appears, for example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the content. Thus, in this paper we address this problem through a data generation technique based on Bayesian Program Learning (BPL). Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet. After generating symbols, we create synthetic lines to train state-of-the-art HTR architectures in a segmentation free fashion. Quantitative and qualitative analyses were carried out and confirm the effectiveness of the proposed method, achieving competitive results compared to the usage of real annotated data.  
  Address (up) Virtual; January 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 602.230; 600.140 Approved no  
  Call Number Admin @ si @ SBD2022 Serial 3615  
Permanent link to this record
 

 
Author Jialuo Chen; M.A.Souibgui; Alicia Fornes; Beata Megyesi edit   pdf
openurl 
  Title A Web-based Interactive Transcription Tool for Encrypted Manuscripts Type Conference Article
  Year 2020 Publication 3rd International Conference on Historical Cryptology Abbreviated Journal  
  Volume Issue Pages 52-59  
  Keywords  
  Abstract Manual transcription of handwritten text is a time consuming task. In the case of encrypted manuscripts, the recognition is even more complex due to the huge variety of alphabets and symbol sets. To speed up and ease this process, we present a web-based tool aimed to (semi)-automatically transcribe the encrypted sources. The user uploads one or several images of the desired encrypted document(s) as input, and the system returns the transcription(s). This process is carried out in an interactive fashion with
the user to obtain more accurate results. For discovering and testing, the developed web tool is freely available.
 
  Address (up) Virtual; June 2020  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference HistoCrypt  
  Notes DAG; 600.140; 602.230; 600.121 Approved no  
  Call Number Admin @ si @ CSF2020 Serial 3447  
Permanent link to this record
 

 
Author Ruben Tito; Minesh Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas edit   pdf
url  openurl
  Title ICDAR 2021 Competition on Document Visual Question Answering Type Conference Article
  Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 635-649  
  Keywords  
  Abstract In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.  
  Address (up) VIRTUAL; Lausanne; Suissa; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ TMJ2021 Serial 3624  
Permanent link to this record
 

 
Author Pau Torras; Arnau Baro; Lei Kang; Alicia Fornes edit  openurl
  Title On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition Type Conference Article
  Year 2021 Publication International Society for Music Information Retrieval Conference Abbreviated Journal  
  Volume Issue Pages 690-696  
  Keywords  
  Abstract Despite the latest advances in Deep Learning, the recognition of handwritten music scores is still a challenging endeavour. Even though the recent Sequence to Sequence(Seq2Seq) architectures have demonstrated its capacity to reliably recognise handwritten text, their performance is still far from satisfactory when applied to historical handwritten scores. Indeed, the ambiguous nature of handwriting, the non-standard musical notation employed by composers of the time and the decaying state of old paper make these scores remarkably difficult to read, sometimes even by trained humans. Thus, in this work we explore the incorporation of language models into a Seq2Seq-based architecture to try to improve transcriptions where the aforementioned unclear writing produces statistically unsound mistakes, which as far as we know, has never been attempted for this field of research on this architecture. After studying various Language Model integration techniques, the experimental evaluation on historical handwritten music scores shows a significant improvement over the state of the art, showing that this is a promising research direction for dealing with such difficult manuscripts.  
  Address (up) Virtual; November 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ISMIR  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ TBK2021 Serial 3616  
Permanent link to this record
 

 
Author Jialuo Chen; Mohamed Ali Souibgui; Alicia Fornes; Beata Megyesi edit   pdf
openurl 
  Title Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images Type Conference Article
  Year 2021 Publication 4th International Conference on Historical Cryptology Abbreviated Journal  
  Volume Issue Pages 34-37  
  Keywords  
  Abstract Historical ciphers contain a wide range ofsymbols from various symbol sets. Iden-tifying the cipher alphabet is a prerequi-site before decryption can take place andis a time-consuming process. In this workwe explore the use of image processing foridentifying the underlying alphabet in ci-pher images, and to compare alphabets be-tween ciphers. The experiments show thatciphers with similar alphabets can be suc-cessfully discovered through clustering.  
  Address (up) Virtual; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference HistoCrypt  
  Notes DAG; 602.230; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ CSF2021 Serial 3617  
Permanent link to this record
 

 
Author Pau Torras; Mohamed Ali Souibgui; Jialuo Chen; Alicia Fornes edit  url
openurl 
  Title A Transcription Is All You Need: Learning to Align through Attention Type Conference Article
  Year 2021 Publication 14th IAPR International Workshop on Graphics Recognition Abbreviated Journal  
  Volume 12916 Issue Pages 141–146  
  Keywords  
  Abstract Historical ciphered manuscripts are a type of document where graphical symbols are used to encrypt their content instead of regular text. Nowadays, expert transcriptions can be found in libraries alongside the corresponding manuscript images. However, those transcriptions are not aligned, so these are barely usable for training deep learning-based recognition methods. To solve this issue, we propose a method to align each symbol in the transcript of an image with its visual representation by using an attention-based Sequence to Sequence (Seq2Seq) model. The core idea is that, by learning to recognise symbols sequence within a cipher line image, the model also identifies their position implicitly through an attention mechanism. Thus, the resulting symbol segmentation can be later used for training algorithms. The experimental evaluation shows that this method is promising, especially taking into account the small size of the cipher dataset.  
  Address (up) Virtual; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference GREC  
  Notes DAG; 602.230; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ TSC2021 Serial 3619  
Permanent link to this record
 

 
Author Minesh Mathew; Viraj Bagal; Ruben Tito; Dimosthenis Karatzas; Ernest Valveny; C.V. Jawahar edit   pdf
url  doi
openurl 
  Title InfographicVQA Type Conference Article
  Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages 1697-1706  
  Keywords Document Analysis Datasets; Evaluation and Comparison of Vision Algorithms; Vision and Languages  
  Abstract Infographics communicate information using a combination of textual, graphical and visual elements. This work explores the automatic understanding of infographic images by using a Visual Question Answering technique. To this end, we present InfographicVQA, a new dataset comprising a diverse collection of infographics and question-answer annotations. The questions require methods that jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with an emphasis on questions that require elementary reasoning and basic arithmetic skills. For VQA on the dataset, we evaluate two Transformer-based strong baselines. Both the baselines yield unsatisfactory results compared to near perfect human performance on the dataset. The results suggest that VQA on infographics--images that are designed to communicate information quickly and clearly to human brain--is ideal for benchmarking machine understanding of complex document images. The dataset is available for download at docvqa. org  
  Address (up) Virtual; Waikoloa; Hawai; USA; January 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 600.155 Approved no  
  Call Number MBT2022 Serial 3625  
Permanent link to this record
 

 
Author Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning Type Conference Article
  Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages 1381-1390  
  Keywords Measurement; Training; Visualization; Analytical models; Computer vision; Computational modeling; Training data  
  Abstract Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase
in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models’ object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights are available online.
 
  Address (up) Virtual; Waikoloa; Hawai; USA; January 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 600.155; 302.105 Approved no  
  Call Number Admin @ si @ BGK2022 Serial 3662  
Permanent link to this record
 

 
Author Ali Furkan Biten; Andres Mafla; Lluis Gomez; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Type Conference Article
  Year 2022 Publication Winter Conference on Applications of Computer Vision Abbreviated Journal  
  Volume Issue Pages 1391-1400  
  Keywords Measurement; Training; Integrated circuits; Annotations; Semantics; Training data; Semisupervised learning  
  Abstract The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forces us to use evaluation metrics based on binary relevance: given a sentence query we consider only one image as relevant. However, many other relevant images or captions may be present in the dataset. In this work, we propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance. Additionally, we incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss. By incorporating our formulation to existing models, a large improvement is obtained in scenarios where available training data is limited. We also demonstrate that the performance on the annotated image-caption pairs is maintained while improving on other non-annotated relevant items when employing the full training set. The code for our new metric can be found at github. com/furkanbiten/ncsmetric and the model implementation at github. com/andrespmd/semanticadaptive_margin.  
  Address (up) Virtual; Waikoloa; Hawai; USA; January 2022  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference WACV  
  Notes DAG; 600.155; 302.105; Approved no  
  Call Number Admin @ si @ BMG2022 Serial 3663  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: