Lluis Gomez _and 6 others_. 2021. Multimodal grid features and cell pointers for scene text visual question answering. _PRL_, **150**, 242–249.