toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
   print
  Records Links
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny edit   pdf
url  openurl
  Title Document Collection Visual Question Answering Type Conference Article
  Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 12822 Issue Pages 778-792  
  Keywords Document collection; Visual Question Answering  
  Abstract Current tasks and methods in Document Understanding aims to process documents as single elements. However, documents are usually organized in collections (historical records, purchase invoices), that provide context useful for their interpretation. To address this problem, we introduce Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer. Along with the dataset we propose a new evaluation metric and baselines which provide further insights to the new dataset and task.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes (down) DAG; 600.121 Approved no  
  Call Number Admin @ si @ TKV2021 Serial 3622  
Permanent link to this record
 

 
Author Ruben Tito; Minesh Mathew; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas edit   pdf
url  openurl
  Title ICDAR 2021 Competition on Document Visual Question Answering Type Conference Article
  Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 635-649  
  Keywords  
  Abstract In this report we present results of the ICDAR 2021 edition of the Document Visual Question Challenges. This edition complements the previous tasks on Single Document VQA and Document Collection VQA with a newly introduced on Infographics VQA. Infographics VQA is based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs. The winner methods have scored 0.6120 ANLS in Infographics VQA task, 0.7743 ANLSL in Document Collection VQA task and 0.8705 ANLS in Single Document VQA. We present a summary of the datasets used for each task, description of each of the submitted methods and the results and analysis of their performance. A summary of the progress made on Single Document VQA since the first edition of the DocVQA 2020 challenge is also presented.  
  Address VIRTUAL; Lausanne; Suissa; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes (down) DAG; 600.121 Approved no  
  Call Number Admin @ si @ TMJ2021 Serial 3624  
Permanent link to this record
 

 
Author Pau Riba; Sounak Dey; Ali Furkan Biten; Josep Llados edit   pdf
openurl 
  Title Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild Type Miscellaneous
  Year 2021 Publication Arxiv Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images. In this cross-modal setting, we first contribute with a tough-to-beat baseline that without any specific SGOL training is able to outperform the previous works on a fixed set of classes. The baseline is useful to analyze the performance of SGOL approaches based on available simple yet powerful methods. We advance prior arts by proposing a sketch-conditioned DETR (DEtection TRansformer) architecture which avoids a hard classification and alleviates the domain gap between sketches and images to localize object instances. Although the main goal of SGOL is focused on object detection, we explored its natural extension to sketch-guided instance segmentation. This novel task allows to move towards identifying the objects at pixel level, which is of key importance in several applications. We experimentally demonstrate that our model and its variants significantly advance over previous state-of-the-art results. All training and testing code of our model will be released to facilitate future researchhttps://github.com/priba/sgol_wild.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (down) DAG; 600.121 Approved no  
  Call Number Admin @ si @ RDB2021 Serial 3674  
Permanent link to this record
 

 
Author Albert Suso; Pau Riba; Oriol Ramos Terrades; Josep Llados edit  url
openurl 
  Title A Self-supervised Inverse Graphics Approach for Sketch Parametrization Type Conference Article
  Year 2021 Publication 16th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 12916 Issue Pages 28-42  
  Keywords  
  Abstract The study of neural generative models of handwritten text and human sketches is a hot topic in the computer vision field. The landmark SketchRNN provided a breakthrough by sequentially generating sketches as a sequence of waypoints, and more recent articles have managed to generate fully vector sketches by coding the strokes as Bézier curves. However, the previous attempts with this approach need them all a ground truth consisting in the sequence of points that make up each stroke, which seriously limits the datasets the model is able to train in. In this work, we present a self-supervised end-to-end inverse graphics approach that learns to embed each image to its best fit of Bézier curves. The self-supervised nature of the training process allows us to train the model in a wider range of datasets, but also to perform better after-training predictions by applying an overfitting process on the input binary image. We report qualitative an quantitative evaluations on the MNIST and the Quick, Draw! datasets.  
  Address Lausanne; Suissa; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes (down) DAG; 600.121 Approved no  
  Call Number Admin @ si @ SRR2021 Serial 3675  
Permanent link to this record
 

 
Author Lluis Gomez; Ali Furkan Biten; Ruben Tito; Andres Mafla; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas edit   pdf
url  openurl
  Title Multimodal grid features and cell pointers for scene text visual question answering Type Journal Article
  Year 2021 Publication Pattern Recognition Letters Abbreviated Journal PRL  
  Volume 150 Issue Pages 242-249  
  Keywords  
  Abstract This paper presents a new model for the task of scene text visual question answering. In this task questions about a given image can only be answered by reading and understanding scene text. Current state of the art models for this task make use of a dual attention mechanism in which one attention module attends to visual features while the other attends to textual features. A possible issue with this is that it makes difficult for the model to reason jointly about both modalities. To fix this problem we propose a new model that is based on an single attention mechanism that attends to multi-modal features conditioned to the question. The output weights of this attention module over a grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text to the given question. Our experiments demonstrate competitive performance in two standard datasets with a model that is faster than previous methods at inference time. Furthermore, we also provide a novel analysis of the ST-VQA dataset based on a human performance study. Supplementary material, code, and data is made available through this link.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (down) DAG; 600.084; 600.121 Approved no  
  Call Number Admin @ si @ GBT2021 Serial 3620  
Permanent link to this record
 

 
Author Josep Llados; Daniel Lopresti; Seiichi Uchida (eds) edit  doi
isbn  openurl
  Title 16th International Conference, 2021, Proceedings, Part III Type Book Whole
  Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal  
  Volume 12823 Issue Pages  
  Keywords  
  Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
 
  Address Lausanne, Switzerland, September 5-10, 2021  
  Corporate Author Thesis  
  Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-030-86333-3 Medium  
  Area Expedition Conference ICDAR  
  Notes (down) DAG Approved no  
  Call Number Admin @ si @ Serial 3727  
Permanent link to this record
 

 
Author Josep Llados; Daniel Lopresti; Seiichi Uchida (eds) edit  doi
isbn  openurl
  Title 16th International Conference, 2021, Proceedings, Part IV Type Book Whole
  Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal  
  Volume 12824 Issue Pages  
  Keywords  
  Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
 
  Address Lausanne, Switzerland, September 5-10, 2021  
  Corporate Author Thesis  
  Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-030-86336-4 Medium  
  Area Expedition Conference ICDAR  
  Notes (down) DAG Approved no  
  Call Number Admin @ si @ Serial 3728  
Permanent link to this record
 

 
Author Josep Llados; Daniel Lopresti; Seiichi Uchida (eds) edit  doi
isbn  openurl
  Title 16th International Conference, 2021, Proceedings, Part I Type Book Whole
  Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal  
  Volume 12821 Issue Pages  
  Keywords  
  Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: historical document analysis, document analysis systems, handwriting recognition, scene text detection and recognition, document image processing, natural language processing (NLP) for document understanding, and graphics, diagram and math recognition.
 
  Address Lausanne, Switzerland, September 5-10, 2021  
  Corporate Author Thesis  
  Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-030-86548-1 Medium  
  Area Expedition Conference ICDAR  
  Notes (down) DAG Approved no  
  Call Number Admin @ si @ Serial 3725  
Permanent link to this record
 

 
Author Josep Llados; Daniel Lopresti; Seiichi Uchida (eds) edit  doi
isbn  openurl
  Title 16th International Conference, 2021, Proceedings, Part II Type Book Whole
  Year 2021 Publication Document Analysis and Recognition – ICDAR 2021 Abbreviated Journal  
  Volume 12822 Issue Pages  
  Keywords  
  Abstract This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.

The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding.
 
  Address Lausanne, Switzerland, September 5-10, 2021  
  Corporate Author Thesis  
  Publisher Springer Cham Place of Publication Editor Josep Llados; Daniel Lopresti; Seiichi Uchida  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-3-030-86330-2 Medium  
  Area Expedition Conference ICDAR  
  Notes (down) DAG Approved no  
  Call Number Admin @ si @ Serial 3726  
Permanent link to this record
 

 
Author Josep Llados edit  openurl
  Title The 5G of Document Intelligence Type Conference Article
  Year 2021 Publication 3rd Workshop on Future of Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address Lausanne; Suissa; September 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference FDAR  
  Notes (down) DAG Approved no  
  Call Number Admin @ si @ Serial 3677  
Permanent link to this record
 

 
Author Domicele Jonauskaite; Lucia Camenzind; C. Alejandro Parraga; Cecile N Diouf; Mathieu Mercapide Ducommun; Lauriane Müller; Melanie Norberg; Christine Mohr edit  url
doi  openurl
  Title Colour-emotion associations in individuals with red-green colour blindness Type Journal Article
  Year 2021 Publication PeerJ Abbreviated Journal  
  Volume 9 Issue Pages e11180  
  Keywords Affect; Chromotherapy; Colour cognition; Colour vision deficiency; Cross-modal correspondences; Daltonism; Deuteranopia; Dichromatic; Emotion; Protanopia.  
  Abstract Colours and emotions are associated in languages and traditions. Some of us may convey sadness by saying feeling blue or by wearing black clothes at funerals. The first example is a conceptual experience of colour and the second example is an immediate perceptual experience of colour. To investigate whether one or the other type of experience more strongly drives colour-emotion associations, we tested 64 congenitally red-green colour-blind men and 66 non-colour-blind men. All participants associated 12 colours, presented as terms or patches, with 20 emotion concepts, and rated intensities of the associated emotions. We found that colour-blind and non-colour-blind men associated similar emotions with colours, irrespective of whether colours were conveyed via terms (r = .82) or patches (r = .80). The colour-emotion associations and the emotion intensities were not modulated by participants' severity of colour blindness. Hinting at some additional, although minor, role of actual colour perception, the consistencies in associations for colour terms and patches were higher in non-colour-blind than colour-blind men. Together, these results suggest that colour-emotion associations in adults do not require immediate perceptual colour experiences, as conceptual experiences are sufficient.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (down) CIC; LAMP; 600.120; 600.128 Approved no  
  Call Number Admin @ si @ JCP2021 Serial 3564  
Permanent link to this record
 

 
Author Hassan Ahmed Sial edit  isbn
openurl 
  Title Estimating Light Effects from a Single Image: Deep Architectures and Ground-Truth Generation Type Book Whole
  Year 2021 Publication PhD Thesis, Universitat Autonoma de Barcelona-CVC Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract In this thesis, we explore how to estimate the effects of the light interacting with the scene objects from a single image. To achieve this goal, we focus on recovering intrinsic components like reflectance, shading, or light properties such as color and position using deep architectures. The success of these approaches relies on training on large and diversified image datasets. Therefore, we present several contributions on this such as: (a) a data-augmentation technique; (b) a ground-truth for an existing multi-illuminant dataset; (c) a family of synthetic datasets, SID for Surreal Intrinsic Datasets, with diversified backgrounds and coherent light conditions; and (d) a practical pipeline to create hybrid ground-truths to overcome the complexity of acquiring realistic light conditions in a massive way. In parallel with the creation of datasets, we trained different flexible encoder-decoder deep architectures incorporating physical constraints from the image formation models.

In the last part of the thesis, we apply all the previous experience to two different problems. Firstly, we create a large hybrid Doc3DShade dataset with real shading and synthetic reflectance under complex illumination conditions, that is used to train a two-stage architecture that improves the character recognition task in complex lighting conditions of unwrapped documents. Secondly, we tackle the problem of single image scene relighting by extending both, the SID dataset to present stronger shading and shadows effects, and the deep architectures to use intrinsic components to estimate new relit images.
 
  Address September 2021  
  Corporate Author Thesis Ph.D. thesis  
  Publisher IMPRIMA Place of Publication Editor Maria Vanrell;Ramon Baldrich  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN 978-84-122714-8-5 Medium  
  Area Expedition Conference  
  Notes (down) CIC; Approved no  
  Call Number Admin @ si @ Sia2021 Serial 3607  
Permanent link to this record
 

 
Author Trevor Canham; Javier Vazquez; Elise Mathieu; Marcelo Bertalmío edit   pdf
url  doi
openurl 
  Title Matching visual induction effects on screens of different size Type Journal Article
  Year 2021 Publication Journal of Vision Abbreviated Journal JOV  
  Volume 21 Issue 6(10) Pages 1-22  
  Keywords  
  Abstract In the film industry, the same movie is expected to be watched on displays of vastly different sizes, from cinema screens to mobile phones. But visual induction, the perceptual phenomenon by which the appearance of a scene region is affected by its surroundings, will be different for the same image shown on two displays of different dimensions. This phenomenon presents a practical challenge for the preservation of the artistic intentions of filmmakers, because it can lead to shifts in image appearance between viewing destinations. In this work, we show that a neural field model based on the efficient representation principle is able to predict induction effects and how, by regularizing its associated energy functional, the model is still able to represent induction but is now invertible. From this finding, we propose a method to preprocess an image in a screen–size dependent way so that its perception, in terms of visual induction, may remain constant across displays of different size. The potential of the method is demonstrated through psychophysical experiments on synthetic images and qualitative examples on natural images.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (down) CIC Approved no  
  Call Number Admin @ si @ CVM2021 Serial 3595  
Permanent link to this record
 

 
Author Graham D. Finlayson; Javier Vazquez; Fufu Fang edit   pdf
doi  openurl
  Title The Discrete Cosine Maximum Ignorance Assumption Type Conference Article
  Year 2021 Publication 29th Color and Imaging Conference Abbreviated Journal  
  Volume Issue Pages 13-18  
  Keywords  
  Abstract the performance of colour correction algorithms are dependent on the reflectance sets used. Sometimes, when the testing reflectance set is changed the ranking of colour correction algorithms also changes. To remove dependence on dataset we can
make assumptions about the set of all possible reflectances. In the Maximum Ignorance with Positivity (MIP) assumption we assume that all reflectances with per wavelength values between 0 and 1 are equally likely. A weakness in the MIP is that it fails to take into account the correlation of reflectance functions between
wavelengths (many of the assumed reflectances are, in reality, not possible).
In this paper, we take the view that the maximum ignorance assumption has merit but, hitherto it has been calculated with respect to the wrong coordinate basis. Here, we propose the Discrete Cosine Maximum Ignorance assumption (DCMI), where
all reflectances that have coordinates between max and min bounds in the Discrete Cosine Basis coordinate system are equally likely.
Here, the correlation between wavelengths is encoded and this results in the set of all plausible reflectances ’looking like’ typical reflectances that occur in nature. This said the DCMI model is also a superset of all measured reflectance sets.
Experiments show that, in colour correction, adopting the DCMI results in similar colour correction performance as using a particular reflectance set.
 
  Address Virtual; November 2021  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CIC  
  Notes (down) CIC Approved no  
  Call Number FVF2021 Serial 3596  
Permanent link to this record
 

 
Author Yasuko Sugito; Trevor Canham; Javier Vazquez; Marcelo Bertalmio edit  url
doi  openurl
  Title A Study of Objective Quality Metrics for HLG-Based HDR/WCG Image Coding Type Journal
  Year 2021 Publication SMPTE Motion Imaging Journal Abbreviated Journal SMPTE  
  Volume 130 Issue 4 Pages 53 - 65  
  Keywords  
  Abstract In this work, we study the suitability of high dynamic range, wide color gamut (HDR/WCG) objective quality metrics to assess the perceived deterioration of compressed images encoded using the hybrid log-gamma (HLG) method, which is the standard for HDR television. Several image quality metrics have been developed to deal specifically with HDR content, although in previous work we showed that the best results (i.e., better matches to the opinion of human expert observers) are obtained by an HDR metric that consists simply in applying a given standard dynamic range metric, called visual information fidelity (VIF), directly to HLG-encoded images. However, all these HDR metrics ignore the chroma components for their calculations, that is, they consider only the luminance channel. For this reason, in the current work, we conduct subjective evaluation experiments in a professional setting using compressed HDR/WCG images encoded with HLG and analyze the ability of the best HDR metric to detect perceivable distortions in the chroma components, as well as the suitability of popular color metrics (including ΔITPR , which supports parameters for HLG) to correlate with the opinion scores. Our first contribution is to show that there is a need to consider the chroma components in HDR metrics, as there are color distortions that subjects perceive but that the best HDR metric fails to detect. Our second contribution is the surprising result that VIF, which utilizes only the luminance channel, correlates much better with the subjective evaluation scores than the metrics investigated that do consider the color components.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes (down) CIC Approved no  
  Call Number SCV2021 Serial 3671  
Permanent link to this record
Select All    Deselect All
 |   | 
Details
   print

Save Citations:
Export Records: