|
Josep Llados. 2021. The 5G of Document Intelligence. 3rd Workshop on Future of Document Analysis and Recognition.
|
|
|
Josep Brugues Pujolras, Lluis Gomez and Dimosthenis Karatzas. 2022. A Multilingual Approach to Scene Text Visual Question Answering. Document Analysis Systems.15th IAPR International Workshop, (DAS2022).65–79.
Abstract: Scene Text Visual Question Answering (ST-VQA) has recently emerged as a hot research topic in Computer Vision. Current ST-VQA models have a big potential for many types of applications but lack the ability to perform well on more than one language at a time due to the lack of multilingual data, as well as the use of monolingual word embeddings for training. In this work, we explore the possibility to obtain bilingual and multilingual VQA models. In that regard, we use an already established VQA model that uses monolingual word embeddings as part of its pipeline and substitute them by FastText and BPEmb multilingual word embeddings that have been aligned to English. Our experiments demonstrate that it is possible to obtain bilingual and multilingual VQA models with a minimal loss in performance in languages not used during training, as well as a multilingual model trained in multiple languages that match the performance of the respective monolingual baselines.
Keywords: Scene text; Visual question answering; Multilingual word embeddings; Vision and language; Deep learning
|
|
|
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2006. Automatic Interpretation of Proofreading Sketches.
|
|
|
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2007. Rejection strategies involving classifier combination for handwriting recognition. 3rd Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA 2007), J. Marti et al. (Eds.) LNCS 4478:97–104.
|
|
|
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2007. A Pen-based Interface for Real-time Document Edition. 9th International Conference on Document Analysis and Recognition..939–944.
|
|
|
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2007. Categorization of Digital Ink Elements using Spectral Features. Seventh IAPR International Workshop on Graphics Recognition.63–64.
|
|
|
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2008. Categorization of Digital Ink Elements using Spectral Features. In W. Liu, J.L., J.M. Ogier, ed. Graphics Recognition: Recent Advances and New Opportunities. Springer–Verlag, 188–198. (LNCS.)
|
|
|
Jose Antonio Rodriguez, Florent Perronnin, Gemma Sanchez and Josep Llados. 2008. Unsupervised writer style adaptation for handwritten word spotting. Pattern Recognition. 19th International Conference on, IBM Best Student Paper Award..
|
|
|
Jose Antonio Rodriguez, Florent Perronnin, Gemma Sanchez and Josep Llados. 2010. Unsupervised writer adaptation of whole-word HMMs with application to word-spotting. PRL, 31(8), 742–749.
Abstract: In this paper we propose a novel approach for writer adaptation in a handwritten word-spotting task. The method exploits the fact that the semi-continuous hidden Markov model separates the word model parameters into (i) a codebook of shapes and (ii) a set of word-specific parameters.
Our main contribution is to employ this property to derive writer-specific word models by statistically adapting an initial universal codebook to each document. This process is unsupervised and does not even require the appearance of the keyword(s) in the searched document. Experimental results show an increase in performance when this adaptation technique is applied. To the best of our knowledge, this is the first work dealing with adaptation for word-spotting. The preliminary version of this paper obtained an IBM Best Student Paper Award at the 19th International Conference on Pattern Recognition.
Keywords: Word-spotting; Handwriting recognition; Writer adaptation; Hidden Markov model; Document analysis
|
|
|
Jordy Van Landeghem and 12 others. 2023. Document Understanding Dataset and Evaluation (DUDE). 20th IEEE International Conference on Computer Vision.19528–19540.
Abstract: We call on the Document AI (DocAI) community to re-evaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks. Document Understanding Dataset and Evaluation (DUDE) seeks to remediate the halted research progress in understanding visually-rich documents (VRDs). We present a new dataset with novelties related to types of questions, answers, and document layouts based on multi-industry, multi-domain, and multi-page VRDs of various origins and dates. Moreover, we are pushing the boundaries of current methods by creating multi-task and multi-domain evaluation setups that more accurately simulate real-world situations where powerful generalization and adaptation under low-resource settings are desired. DUDE aims to set a new standard as a more practical, long-standing benchmark for the community, and we hope that it will lead to future extensions and contributions that address real-world challenges. Finally, our work illustrates the importance of finding more efficient ways to model language, images, and layout in DocAI.
|
|