|   | 
Details
   web
Records
Author (up) Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title Hierarchical multimodal transformers for Multi-Page DocVQA Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal PR
Volume 144 Issue Pages 109834
Keywords
Abstract Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISSN 0031-3203 ISBN Medium
Area Expedition Conference
Notes DAG; 600.155; 600.121 Approved no
Call Number Admin @ si @ TKV2023 Serial 3825
Permanent link to this record
 

 
Author (up) Ruben Tito; Dimosthenis Karatzas; Ernest Valveny
Title Hierarchical multimodal transformers for Multipage DocVQA Type Journal Article
Year 2023 Publication Pattern Recognition Abbreviated Journal PR
Volume 144 Issue 109834 Pages
Keywords
Abstract Existing work on DocVQA only considers single-page documents. However, in real applications documents are mostly composed of multiple pages that should be processed altogether. In this work, we propose a new multimodal hierarchical method Hi-VT5, that overcomes the limitations of current methods to process long multipage documents. In contrast to previous hierarchical methods that focus on different semantic granularity (He et al., 2021) or different subtasks (Zhou et al., 2022) used in image classification. Our method is a hierarchical transformer architecture where the encoder learns to summarize the most relevant information of every page and then, the decoder uses this summarized representation to generate the final answer, following a bottom-up approach. Moreover, due to the lack of multipage DocVQA datasets, we also introduce MP-DocVQA, an extension of SP-DocVQA where questions are posed over multipage documents instead of single pages. Through extensive experimentation, we demonstrate that Hi-VT5 is able, in a single stage, to answer the questions and provide the page that contains the answer, which can be used as a kind of explainability measure.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ TKV2023 Serial 3836
Permanent link to this record
 

 
Author (up) Ruben Tito; Khanh Nguyen; Marlon Tobaben; Raouf Kerkouche; Mohamed Ali Souibgui; Kangsoo Jung; Lei Kang; Ernest Valveny; Antti Honkela; Mario Fritz; Dimosthenis Karatzas
Title Privacy-Aware Document Visual Question Answering Type Miscellaneous
Year 2023 Publication Arxiv Abbreviated Journal
Volume Issue Pages
Keywords
Abstract Document Visual Question Answering (DocVQA) is a fast growing branch of document understanding. Despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees.
In this work, we explore privacy in the domain of DocVQA for the first time. We highlight privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions.
Specifically, we focus on the invoice processing use case as a realistic, widely used scenario for document understanding, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the ID of the invoice issuer is the sensitive information to be protected.
We demonstrate that non-private models tend to memorise, behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through any of the two input modalities: vision (document image) or language (OCR tokens).
Finally, we design an attack exploiting the memorisation effect of the model, and demonstrate its effectiveness in probing different DocVQA models.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ PNT2023 Serial 4012
Permanent link to this record
 

 
Author (up) Ruben Tito; Minesh Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas
Title ICDAR 2021 Competition on Document Visual Question Answering Type Conference Article
Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 635-649
Keywords
Abstract In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.
Address VIRTUAL; Lausanne; Suissa; September 2021
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.121 Approved no
Call Number Admin @ si @ TMJ2021 Serial 3624
Permanent link to this record
 

 
Author (up) Rui Hua; Oriol Pujol; Francesco Ciompi; Marina Alberti; Simone Balocco; J. Mauri; Petia Radeva
Title Stent Strut Detection by Classifying a Wide Set of IVUS Features Type Conference Article
Year 2012 Publication Computed Assisted Stenting Workshop Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Nice, France
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference STENT
Notes MILAB;HuPBA Approved no
Call Number Admin @ si @ HPC2012 Serial 2169
Permanent link to this record
 

 
Author (up) Rui Zhang; Yongsheng Zhou; Qianyi Jiang; Qi Song; Nan Li; Kai Zhou; Lei Wang; Dong Wang; Minghui Liao; Mingkun Yang; Xiang Bai; Baoguang Shi; Dimosthenis Karatzas; Shijian Lu; CV Jawahar
Title ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard Type Conference Article
Year 2019 Publication 15th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume Issue Pages 1577-1581
Keywords
Abstract Chinese scene text reading is one of the most challenging problems in computer vision and has attracted great interest. Different from English text, Chinese has more than 6000 commonly used characters and Chinesecharacters can be arranged in various layouts with numerous fonts. The Chinese signboards in street view are a good choice for Chinese scene text images since they have different backgrounds, fonts and layouts. We organized a competition called ICDAR2019-ReCTS, which mainly focuses on reading Chinese text on signboard. This report presents the final results of the competition. A large-scale dataset of 25,000 annotated signboard images, in which all the text lines and characters are annotated with locations and transcriptions, were released. Four tasks, namely character recognition, text line recognition, text line detection and end-to-end recognition were set up. Besides, considering the Chinese text ambiguity issue, we proposed a multi ground truth (multi-GT) evaluation method to make evaluation fairer. The competition started on March 1, 2019 and ended on April 30, 2019. 262 submissions from 46 teams are received. Most of the participants come from universities, research institutes, and tech companies in China. There are also some participants from the United States, Australia, Singapore, and Korea. 21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.
Address Sydney; Australia; September 2019
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG; 600.129; 600.121 Approved no
Call Number Admin @ si @ LZZ2019 Serial 3335
Permanent link to this record
 

 
Author (up) Ruth Aylett; Ginevra Castellano; Bogdan Raducanu; Ana Paiva; Marc Hanheide
Title Long-term socially perceptive and interactive robot companions: challenges and future perspectives Type Conference Article
Year 2011 Publication 13th International Conference on Multimodal Interaction Abbreviated Journal
Volume Issue Pages 323-326
Keywords human-robot interaction, multimodal interaction, social robotics
Abstract This paper gives a brief overview of the challenges for multi-model perception and generation applied to robot companions located in human social environments. It reviews the current position in both perception and generation and the immediate technical challenges and goes on to consider the extra issues raised by embodiment and social context. Finally, it briefly discusses the impact of systems that must function continually over months rather than just for a few hours.
Address Alicante
Corporate Author Thesis
Publisher ACM Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-1-4503-0641-6 Medium
Area Expedition Conference ICMI
Notes OR;MV Approved no
Call Number Admin @ si @ ACR2011 Serial 1888
Permanent link to this record
 

 
Author (up) S. Casanovas
Title Seguiment de moviment articulat mitjançant flux òptic i metodes estocastics Type Report
Year 2000 Publication CVC Technical Report #43 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address CVC (UAB)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ Cas2000 Serial 344
Permanent link to this record
 

 
Author (up) S. Chanda; Oriol Ramos Terrades; Umapada Pal
Title SVM Based Scheme for Thai and English Script Identification Type Conference Article
Year 2007 Publication 9th International Conference on Document Analysis and Recognition Abbreviated Journal
Volume 1 Issue Pages 551–555
Keywords
Abstract
Address Curitiba (Brazil)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference ICDAR
Notes DAG Approved no
Call Number DAG @ dag @ CRP2007a Serial 885
Permanent link to this record
 

 
Author (up) S. Chanda; Umapada Pal; Oriol Ramos Terrades
Title Word-Wise Thai and Roman Script Identification Type Journal
Year 2009 Publication ACM Transactions on Asian Language Information Processing Abbreviated Journal TALIP
Volume 8 Issue 3 Pages 1-21
Keywords
Abstract In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1530-0226 ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ CPR2009f Serial 1869
Permanent link to this record
 

 
Author (up) S. Garcia; Dani Rowe; Jordi Gonzalez; Juan J. Villanueva
Title Articulated Object Modelling Using Neural Gas Networks Type Miscellaneous
Year 2005 Publication 5th IASTED International Conference on Visualization, Imaging and Image Processing (VIIP’2005) Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Benidorm (Spain)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number ISE @ ise @ GRG2005 Serial 606
Permanent link to this record
 

 
Author (up) S. Gonzalez; A. Martinez
Title Fundamentos de la Vision aplicada a la Robotica Autonoma. Type Miscellaneous
Year 1997 Publication Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes Approved no
Call Number Admin @ si @ GoM1997 Serial 204
Permanent link to this record
 

 
Author (up) S. Tanimoto; N. Bruining; David Rotger; Petia Radeva; J. Ligthart; R.T. van Domburg; P. W. Serryus
Title Late Stent Recoil of the Bioabsorbable Everolimus Eluting Coronary Stent and its Relationship with Stent Struts Distribution and Plaque Morphology Type Journal
Year 2008 Publication Journal of the American College of Cardiology, vol. 52(20):1616–1620 Abbreviated Journal
Volume Issue Pages
Keywords
Abstract
Address Bridgewater, NJ 08807(USA)
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN Medium
Area Expedition Conference
Notes MILAB Approved no
Call Number BCNPCL @ bcnpcl @ TBR2008 Serial 953
Permanent link to this record
 

 
Author (up) S.Grau; Ana Puig; Sergio Escalera; Maria Salamo
Title Intelligent Interactive Volume Classification Type Conference Article
Year 2013 Publication Pacific Graphics Abbreviated Journal
Volume 32 Issue 7 Pages 23-28
Keywords
Abstract This paper defines an intelligent and interactive framework to classify multiple regions of interest from the original data on demand, without requiring any preprocessing or previous segmentation. The proposed intelligent and interactive approach is divided in three stages: visualize, training and testing. First, users visualize and label some samples directly on slices of the volume. Training and testing are based on a framework of Error Correcting Output Codes and Adaboost classifiers that learn to classify each region the user has painted. Later, at the testing stage, each classifier is directly applied on the rest of samples and combined to perform multi-class labeling, being used in the final rendering. We also parallelized the training stage using a GPU-based implementation for
obtaining a rapid interaction and classification.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-3-905674-50-7 Medium
Area Expedition Conference PG
Notes HuPBA; 600.046;MILAB Approved no
Call Number Admin @ si @ GPE2013b Serial 2355
Permanent link to this record
 

 
Author (up) S.Grau; Anna Puig; Sergio Escalera; Maria Salamo; Oscar Amoros
Title Efficient complementary viewpoint selection in volume rendering Type Conference Article
Year 2013 Publication 21st WSCG Conference on Computer Graphics, Abbreviated Journal
Volume Issue Pages
Keywords Dual camera; Visualization; Interactive Interfaces; Dynamic Time Warping.
Abstract A major goal of visualization is to appropriately express knowledge of scientific data. Generally, gathering visual information contained in the volume data often requires a lot of expertise from the final user to setup the parameters of the visualization. One way of alleviating this problem is to provide the position of inner structures with different viewpoint locations to enhance the perception and construction of the mental image. To this end, traditional illustrations use two or three different views of the regions of interest. Similarly, with the aim of assisting the users to easily place a good viewpoint location, this paper proposes an automatic and interactive method that locates different complementary viewpoints from a reference camera in volume datasets. Specifically, the proposed method combines the quantity of information each camera provides for each structure and the shape similarity of the projections of the remaining viewpoints based on Dynamic Time Warping. The selected complementary viewpoints allow a better understanding of the focused structure in several applications. Thus, the user interactively receives feedback based on several viewpoints that helps him to understand the visual information. A live-user evaluation on different data sets show a good convergence to useful complementary viewpoints.
Address
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN ISBN 978-808694374-9 Medium
Area Expedition Conference WSCG
Notes HuPBA; 600.046;MILAB Approved no
Call Number Admin @ si @ GPE2013a Serial 2255
Permanent link to this record