toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
  Record Links
Author (up) Ali Furkan Biten; R. Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas edit   pdf
url  doi
  Title Scene Text Visual Question Answering Type Conference Article
  Year 2019 Publication 18th IEEE International Conference on Computer Vision Abbreviated Journal  
  Volume Issue Pages 4291-4301  
  Abstract Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting highlevel semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.  
  Address Seul; Corea; October 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICCV  
  Notes DAG; 600.129; 600.135; 601.338; 600.121 Approved no  
  Call Number Admin @ si @ BTM2019b Serial 3285  
Permanent link to this record
Select All    Deselect All
 |   | 

Save Citations:
Export Records: