Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2007. Rejection strategies involving classifier combination for handwriting recognition. 3rd Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA 2007), J. Marti et al. (Eds.) LNCS 4478:97–104.
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2007. A Pen-based Interface for Real-time Document Edition. 9th International Conference on Document Analysis and Recognition..939–944.
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2007. Categorization of Digital Ink Elements using Spectral Features. Seventh IAPR International Workshop on Graphics Recognition.63–64.
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2008. Categorization of Digital Ink Elements using Spectral Features. In W. Liu, J.L., J.M. Ogier, ed. Graphics Recognition: Recent Advances and New Opportunities. Springer–Verlag, 188–198. (LNCS.)
Josep Brugues Pujolras, Lluis Gomez and Dimosthenis Karatzas. 2022. A Multilingual Approach to Scene Text Visual Question Answering. Document Analysis Systems.15th IAPR International Workshop, (DAS2022).65–79.
Abstract: Scene Text Visual Question Answering (ST-VQA) has recently emerged as a hot research topic in Computer Vision. Current ST-VQA models have a big potential for many types of applications but lack the ability to perform well on more than one language at a time due to the lack of multilingual data, as well as the use of monolingual word embeddings for training. In this work, we explore the possibility to obtain bilingual and multilingual VQA models. In that regard, we use an already established VQA model that uses monolingual word embeddings as part of its pipeline and substitute them by FastText and BPEmb multilingual word embeddings that have been aligned to English. Our experiments demonstrate that it is possible to obtain bilingual and multilingual VQA models with a minimal loss in performance in languages not used during training, as well as a multilingual model trained in multiple languages that match the performance of the respective monolingual baselines.
Keywords: Scene text; Visual question answering; Multilingual word embeddings; Vision and language; Deep learning
Josep Llados. 1996. Interpretacio de dibuixos linials fets a ma alçada mitjançant isomorfisme entre subgrafs i transformacio de Hough.
Josep Llados. 2006. Perspectives on the Analysis of Graphical Documents.
Josep Llados. 2006. Computer Vision: Progress of Research and Development.
Josep Llados. 2007. Advances in Graphics Recognition. Digital Document Processing, Major Directions and Recent Advances, Advances in Pattern Recognition, B.B. Chaudhuri, ed., 281–304.
Josep Llados. 2021. The 5G of Document Intelligence. 3rd Workshop on Future of Document Analysis and Recognition.