|
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2007. Categorization of Digital Ink Elements using Spectral Features. Seventh IAPR International Workshop on Graphics Recognition.63–64.
|
|
|
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2008. Categorization of Digital Ink Elements using Spectral Features. In W. Liu, J.L., J.M. Ogier, ed. Graphics Recognition: Recent Advances and New Opportunities. Springer–Verlag, 188–198. (LNCS.)
|
|
|
Lei Kang, Pau Riba, Mauricio Villegas, Alicia Fornes and Marçal Rusiñol. 2021. Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture. PR, 112, 107790.
Abstract: Sequence-to-sequence models have recently become very popular for tackling
handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging
problem. The main challenge faced when training a language model is to
deal with the language model corpus which is usually different to the one
used for training the handwritten word recognition system. Thus, the bias
between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this
work, we introduce Candidate Fusion, a novel way to integrate an external
language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to
the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two
improvements. On the one hand, the sequence-to-sequence recognizer has
the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided
by the language model. On the other hand, the external language model
has the ability to adapt itself to the training corpus and even learn the
most commonly errors produced from the recognizer. Finally, by conducting
comprehensive experiments, the Candidate Fusion proves to outperform the
state-of-the-art language models for handwritten word recognition tasks.
|
|
|
Adarsh Tiwari, Sanket Biswas and Josep Llados. 2023. Can Pre-trained Language Models Help in Understanding Handwritten Symbols? 17th International Conference on Document Analysis and Recognition.199–211.
Abstract: The emergence of transformer models like BERT, GPT-2, GPT-3, RoBERTa, T5 for natural language understanding tasks has opened the floodgates towards solving a wide array of machine learning tasks in other modalities like images, audio, music, sketches and so on. These language models are domain-agnostic and as a result could be applied to 1-D sequences of any kind. However, the key challenge lies in bridging the modality gap so that they could generate strong features beneficial for out-of-domain tasks. This work focuses on leveraging the power of such pre-trained language models and discusses the challenges in predicting challenging handwritten symbols and alphabets.
|
|
|
Mohammed Al Rawi, Ernest Valveny and Dimosthenis Karatzas. 2019. Can One Deep Learning Model Learn Script-Independent Multilingual Word-Spotting? 15th International Conference on Document Analysis and Recognition.260–267.
Abstract: Word spotting has gained increased attention lately as it can be used to extract textual information from handwritten documents and scene-text images. Current word spotting approaches are designed to work on a single language and/or script. Building intelligent models that learn script-independent multilingual word-spotting is challenging due to the large variability of multilingual alphabets and symbols. We used ResNet-152 and the Pyramidal Histogram of Characters (PHOC) embedding to build a one-model script-independent multilingual word-spotting and we tested it on Latin, Arabic, and Bangla (Indian) languages. The one-model we propose performs on par with the multi-model language-specific word-spotting system, and thus, reduces the number of models needed for each script and/or language.
|
|
|
Adria Rico and Alicia Fornes. 2017. Camera-based Optical Music Recognition using a Convolutional Neural Network. 12th IAPR International Workshop on Graphics Recognition.27–28.
Abstract: Optical Music Recognition (OMR) consists in recognizing images of music scores. Contrary to expectation, the current OMR systems usually fail when recognizing images of scores captured by digital cameras and smartphones. In this work, we propose a camera-based OMR system based on Convolutional Neural Networks, showing promising preliminary results
Keywords: optical music recognition; document analysis; convolutional neural network; deep learning
|
|
|
Marçal Rusiñol, Josep Llados and Philippe Dosch. 2007. Camera-Based Graphical Symbol Detection. 9th IEEE International Conference on Document Analysis and Recognition.884–888.
|
|
|
Mathieu Nicolas Delalandre, Tony Pridmore, Ernest Valveny, Eric Trupin and Herve Locteau. 2007. Building Synthetic Graphical Documents for Performance Evaluation. Seventh IAPR International Workshop on Graphics Recognition.84–87.
|
|
|
Mathieu Nicolas Delalandre, Tony Pridmore, Ernest Valveny, Herve Locteau and Eric Trupin. 2008. Building Synthetic Graphical Documents for Performance Evaluation. In W. Liu, J.L., J.M. Ogier, ed. Graphics Recognition: Recent Advances and New Opportunities.288–298. (LNCS.)
|
|
|
Alicia Fornes, Josep Llados and Joana Maria Pujadas-Mora. 2020. Browsing of the Social Network of the Past: Information Extraction from Population Manuscript Images. Handwritten Historical Document Analysis, Recognition, and Retrieval – State of the Art and Future Trends. World Scientific.
|
|