toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author (up) Ruben Tito; Dimosthenis Karatzas; Ernest Valveny edit   pdf
doi  openurl
  Title Hierarchical multimodal transformers for Multi-Page DocVQA Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 144 Issue Pages 109834  
  Keywords  
  Abstract Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.155; 600.121 Approved no  
  Call Number Admin @ si @ TKV2023 Serial 3825  
Permanent link to this record
 

 
Author (up) Ruben Tito; Dimosthenis Karatzas; Ernest Valveny edit   pdf
url  openurl
  Title Hierarchical multimodal transformers for Multipage DocVQA Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 144 Issue 109834 Pages  
  Keywords  
  Abstract Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ TKV2023 Serial 3836  
Permanent link to this record
 

 
Author (up) Ruben Tito; Khanh Nguyen; Marlon Tobaben; Raouf Kerkouche; Mohamed Ali Souibgui; Kangsoo Jung; Lei Kang; Ernest Valveny; Antti Honkela; Mario Fritz; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Privacy-Aware Document Visual Question Answering Type Miscellaneous
  Year 2023 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ PNT2023 Serial 4012  
Permanent link to this record
 

 
Author (up) Ruben Tito; Minesh Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas edit   pdf
url  openurl
  Title ICDAR 2021 Competition on Document Visual Question Answering Type Conference Article
  Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 635-649  
  Keywords  
  Abstract In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.  
  Address VIRTUAL; Lausanne; Suissa; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.121 Approved no  
  Call Number Admin @ si @ TMJ2021 Serial 3624  
Permanent link to this record
 

 
Author (up) Rui Hua; Oriol Pujol; Francesco Ciompi; Marina Alberti; Simone Balocco; J. Mauri; Petia Radeva edit   pdf
openurl 
  Title Stent Strut Detection by Classifying a Wide Set of IVUS Features Type Conference Article
  Year 2012 Publication Computed Assisted Stenting Workshop Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address Nice, France  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference STENT  
  Notes MILAB;HuPBA Approved no  
  Call Number Admin @ si @ HPC2012 Serial 2169  
Permanent link to this record
 

 
Author (up) Rui Zhang; Yongsheng Zhou; Qianyi Jiang; Qi Song; Nan Li; Kai Zhou; Lei Wang; Dong Wang; Minghui Liao; Mingkun Yang; Xiang Bai; Baoguang Shi; Dimosthenis Karatzas; Shijian Lu; CV Jawahar edit   pdf
url  doi
openurl 
  Title ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard Type Conference Article
  Year 2019 Publication 15th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 1577-1581  
  Keywords  
  Abstract Chinese scene text reading is one of the most challenging problems in computer vision and has attracted great interest. Different from English text, Chinese has more than 6000 commonly used characters and Chinesecharacters can be arranged in various layouts with numerous fonts. The Chinese signboards in street view are a good choice for Chinese scene text images since they have different backgrounds, fonts and layouts. We organized a competition called ICDAR2019-ReCTS, which mainly focuses on reading Chinese text on signboard. This report presents the final results of the competition. A large-scale dataset of 25,000 annotated signboard images, in which all the text lines and characters are annotated with locations and transcriptions, were released. Four tasks, namely character recognition, text line recognition, text line detection and end-to-end recognition were set up. Besides, considering the Chinese text ambiguity issue, we proposed a multi ground truth (multi-GT) evaluation method to make evaluation fairer. The competition started on March 1, 2019 and ended on April 30, 2019. 262 submissions from 46 teams are received. Most of the participants come from universities, research institutes, and tech companies in China. There are also some participants from the United States, Australia, Singapore, and Korea. 21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.  
  Address Sydney; Australia; September 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.129; 600.121 Approved no  
  Call Number Admin @ si @ LZZ2019 Serial 3335  
Permanent link to this record
 

 
Author (up) Ruth Aylett; Ginevra Castellano; Bogdan Raducanu; Ana Paiva; Marc Hanheide edit  url
doi  isbn
openurl 
  Title Long-term socially perceptive and interactive robot companions: challenges and future perspectives Type Conference Article
  Year 2011 Publication 13th International Conference on Multimodal Interaction Abbreviated Journal  
  Volume Issue Pages 323-326  
  Keywords human-robot interaction, multimodal interaction, social robotics  
  Abstract This paper gives a brief overview of the challenges for multi-model perception and generation applied to robot companions located in human social environments. It reviews the current position in both perception and generation and the immediate technical challenges and goes on to consider the extra issues raised by embodiment and social context. Finally, it briefly discusses the impact of systems that must function continually over months rather than just for a few hours.  
  Address Alicante  
  Corporate Author Thesis  
  Publisher ACM Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-1-4503-0641-6 Medium  
  Area Expedition Conference ICMI  
  Notes OR;MV Approved no  
  Call Number Admin @ si @ ACR2011 Serial 1888  
Permanent link to this record
 

 
Author (up) S. Casanovas edit  openurl
  Title Seguiment de moviment articulat mitjançant flux òptic i metodes estocastics Type Report
  Year 2000 Publication CVC Technical Report #43 Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address CVC (UAB)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number Admin @ si @ Cas2000 Serial 344  
Permanent link to this record
 

 
Author (up) S. Chanda; Oriol Ramos Terrades; Umapada Pal edit  openurl
  Title SVM Based Scheme for Thai and English Script Identification Type Conference Article
  Year 2007 Publication 9th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 1 Issue Pages 551–555  
  Keywords  
  Abstract  
  Address Curitiba (Brazil)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number DAG @ dag @ CRP2007a Serial 885  
Permanent link to this record
 

 
Author (up) S. Chanda; Umapada Pal; Oriol Ramos Terrades edit  doi
openurl 
  Title Word-Wise Thai and Roman Script Identification Type Journal
  Year 2009 Publication ACM Transactions on Asian Language Information Processing Abbreviated Journal TALIP  
  Volume 8 Issue 3 Pages 1-21  
  Keywords  
  Abstract In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1530-0226 ISBN Medium  
  Area Expedition Conference  
  Notes DAG Approved no  
  Call Number Admin @ si @ CPR2009f Serial 1869  
Permanent link to this record
 

 
Author (up) S. Garcia; Dani Rowe; Jordi Gonzalez; Juan J. Villanueva edit  openurl
  Title Articulated Object Modelling Using Neural Gas Networks Type Miscellaneous
  Year 2005 Publication 5th IASTED International Conference on Visualization, Imaging and Image Processing (VIIP’2005) Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address Benidorm (Spain)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number ISE @ ise @ GRG2005 Serial 606  
Permanent link to this record
 

 
Author (up) S. Gonzalez; A. Martinez edit  openurl
  Title Fundamentos de la Vision aplicada a la Robotica Autonoma. Type Miscellaneous
  Year 1997 Publication Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes Approved no  
  Call Number Admin @ si @ GoM1997 Serial 204  
Permanent link to this record
 

 
Author (up) S. Tanimoto; N. Bruining; David Rotger; Petia Radeva; J. Ligthart; R.T. van Domburg; P. W. Serryus edit  openurl
  Title Late Stent Recoil of the Bioabsorbable Everolimus Eluting Coronary Stent and its Relationship with Stent Struts Distribution and Plaque Morphology Type Journal
  Year 2008 Publication Journal of the American College of Cardiology, vol. 52(20):1616–1620 Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address Bridgewater, NJ 08807(USA)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MILAB Approved no  
  Call Number BCNPCL @ bcnpcl @ TBR2008 Serial 953  
Permanent link to this record
 

 
Author (up) S.Grau; Ana Puig; Sergio Escalera; Maria Salamo edit   pdf
url  doi
isbn  openurl
  Title Intelligent Interactive Volume Classification Type Conference Article
  Year 2013 Publication Pacific Graphics Abbreviated Journal  
  Volume 32 Issue 7 Pages 23-28  
  Keywords  
  Abstract This paper defines an intelligent and interactive framework to classify multiple regions of interest from the original data on demand, without requiring any preprocessing or previous segmentation. The proposed intelligent and interactive approach is divided in three stages: visualize, training and testing. First, users visualize and label some samples directly on slices of the volume. Training and testing are based on a framework of Error Correcting Output Codes and Adaboost classifiers that learn to classify each region the user has painted. Later, at the testing stage, each classifier is directly applied on the rest of samples and combined to perform multi-class labeling, being used in the final rendering. We also parallelized the training stage using a GPU-based implementation for
obtaining a rapid interaction and classification.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-905674-50-7 Medium  
  Area Expedition Conference PG  
  Notes HuPBA; 600.046;MILAB Approved no  
  Call Number Admin @ si @ GPE2013b Serial 2355  
Permanent link to this record
 

 
Author (up) S.Grau; Anna Puig; Sergio Escalera; Maria Salamo; Oscar Amoros edit  isbn
openurl 
  Title Efficient complementary viewpoint selection in volume rendering Type Conference Article
  Year 2013 Publication 21st WSCG Conference on Computer Graphics, Abbreviated Journal  
  Volume Issue Pages  
  Keywords Dual camera; Visualization; Interactive Interfaces; Dynamic Time Warping.  
  Abstract A major goal of visualization is to appropriately express knowledge of scientific data. Generally, gathering visual information contained in the volume data often requires a lot of expertise from the final user to setup the parameters of the visualization. One way of alleviating this problem is to provide the position of inner structures with different viewpoint locations to enhance the perception and construction of the mental image. To this end, traditional illustrations use two or three different views of the regions of interest. Similarly, with the aim of assisting the users to easily place a good viewpoint location, this paper proposes an automatic and interactive method that locates different complementary viewpoints from a reference camera in volume datasets. Specifically, the proposed method combines the quantity of information each camera provides for each structure and the shape similarity of the projections of the remaining viewpoints based on Dynamic Time Warping. The selected complementary viewpoints allow a better understanding of the focused structure in several applications. Thus, the user interactively receives feedback based on several viewpoints that helps him to understand the visual information. A live-user evaluation on different data sets show a good convergence to useful complementary viewpoints.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-808694374-9 Medium  
  Area Expedition Conference WSCG  
  Notes HuPBA; 600.046;MILAB Approved no  
  Call Number Admin @ si @ GPE2013a Serial 2255  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: