toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Soumya Jahagirdar; Minesh Mathew; Dimosthenis Karatzas; CV Jawahar edit   pdf
url  openurl
  Title Understanding Video Scenes Through Text: Insights from Text-Based Video Question Answering Type Conference Article
  Year 2023 Publication Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively. Particularly, comprehending text in videos holds great significance, requiring both scene text understanding and temporal reasoning. This paper focuses on exploring two recently introduced datasets, NewsVideoQA and M4-ViteVQA, which aim to address video question answering based on textual content. The NewsVideoQA dataset contains question-answer pairs related to the text in news videos, while M4- ViteVQA comprises question-answer pairs from diverse categories like vlogging, traveling, and shopping. We provide an analysis of the formulation of these datasets on various levels, exploring the degree of visual understanding and multi-frame comprehension required for answering the questions. Additionally, the study includes experimentation with BERT-QA, a text-only model, which demonstrates comparable performance to the original methods on both datasets, indicating the shortcomings in the formulation of these datasets. Furthermore, we also look into the domain adaptation aspect by examining the effectiveness of training on M4-ViteVQA and evaluating on NewsVideoQA and vice-versa, thereby shedding light on the challenges and potential benefits of out-of-domain training.  
  Address Paris; France; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) ICCVW  
  Notes DAG Approved no  
  Call Number Admin @ si @ JMK2023 Serial 3946  
Permanent link to this record
 

 
Author Leonardo Galteri; Dena Bazazian; Lorenzo Seidenari; Marco Bertini; Andrew Bagdanov; Anguelos Nicolaou; Dimosthenis Karatzas; Alberto del Bimbo edit   pdf
doi  openurl
  Title Reading Text in the Wild from Compressed Images Type Conference Article
  Year 2017 Publication 1st International workshop on Egocentric Perception, Interaction and Computing Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract Reading text in the wild is gaining attention in the computer vision community. Images captured in the wild are almost always compressed to varying degrees, depending on application context, and this compression introduces artifacts
that distort image content into the captured images. In this paper we investigate the impact these compression artifacts have on text localization and recognition in the wild. We also propose a deep Convolutional Neural Network (CNN) that can eliminate text-specific compression artifacts and which leads to an improvement in text recognition. Experimental results on the ICDAR-Challenge4 dataset demonstrate that compression artifacts have a significant
impact on text localization and recognition and that our approach yields an improvement in both – especially at high compression rates.
 
  Address Venice; Italy; October 2017  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) ICCV - EPIC  
  Notes DAG; 600.084; 600.121 Approved no  
  Call Number Admin @ si @ GBS2017 Serial 3006  
Permanent link to this record
 

 
Author Jon Almazan; Albert Gordo; Alicia Fornes; Ernest Valveny edit   pdf
doi  openurl
  Title Handwritten Word Spotting with Corrected Attributes Type Conference Article
  Year 2013 Publication 15th IEEE International Conference on Computer Vision Abbreviated Journal  
  Volume Issue Pages 1017-1024  
  Keywords  
  Abstract We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results.  
  Address Sydney; Australia; December 2013  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN 1550-5499 ISBN Medium  
  Area Expedition Conference (down) ICCV  
  Notes DAG Approved no  
  Call Number Admin @ si @ AGF2013 Serial 2327  
Permanent link to this record
 

 
Author Ali Furkan Biten; R. Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas edit   pdf
url  doi
openurl 
  Title Scene Text Visual Question Answering Type Conference Article
  Year 2019 Publication 18th IEEE International Conference on Computer Vision Abbreviated Journal  
  Volume Issue Pages 4291-4301  
  Keywords  
  Abstract Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting highlevel semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.  
  Address Seul; Corea; October 2019  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) ICCV  
  Notes DAG; 600.129; 600.135; 601.338; 600.121 Approved no  
  Call Number Admin @ si @ BTM2019b Serial 3285  
Permanent link to this record
 

 
Author Jordy Van Landeghem; Ruben Tito; Lukasz Borchmann; Michal Pietruszka; Pawel Joziak; Rafal Powalski; Dawid Jurkiewicz; Mickael Coustaty; Bertrand Anckaert; Ernest Valveny; Matthew Blaschko; Sien Moens; Tomasz Stanislawek edit   pdf
url  openurl
  Title Document Understanding Dataset and Evaluation (DUDE) Type Conference Article
  Year 2023 Publication 20th IEEE International Conference on Computer Vision Abbreviated Journal  
  Volume Issue Pages 19528-19540  
  Keywords  
  Abstract We call on the Document AI (DocAI) community to re-evaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks. Document Understanding Dataset and Evaluation (DUDE) seeks to remediate the halted research progress in understanding visually-rich documents (VRDs). We present a new dataset with novelties related to types of questions, answers, and document layouts based on multi-industry, multi-domain, and multi-page VRDs of various origins and dates. Moreover, we are pushing the boundaries of current methods by creating multi-task and multi-domain evaluation setups that more accurately simulate real-world situations where powerful generalization and adaptation under low-resource settings are desired. DUDE aims to set a new standard as a more practical, long-standing benchmark for the community, and we hope that it will lead to future extensions and contributions that address real-world challenges. Finally, our work illustrates the importance of finding more efficient ways to model language, images, and layout in DocAI.  
  Address Paris; France; October 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) ICCV  
  Notes DAG Approved no  
  Call Number Admin @ si @ LTB2023 Serial 3948  
Permanent link to this record
 

 
Author Partha Pratim Roy; Josep Llados; Umapada Pal edit  openurl
  Title Text/Graphics Separation in Color Maps Type Conference Article
  Year 2007 Publication International Conference on Computing: Theory and Applications Abbreviated Journal  
  Volume Issue Pages 545–551  
  Keywords  
  Abstract  
  Address Kolkata (India)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) ICCTA  
  Notes DAG Approved no  
  Call Number DAG @ dag @ RLP2007a Serial 806  
Permanent link to this record
 

 
Author Partha Pratim Roy; Josep Llados edit  openurl
  Title Multi-Oriented Character Recognition from Graphical Documents Type Conference Article
  Year 2008 Publication 2nd International Conference on Cognition and Recognition Abbreviated Journal  
  Volume Issue Pages 30–35  
  Keywords  
  Abstract  
  Address Mandya (India)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) ICCR  
  Notes DAG Approved no  
  Call Number DAG @ dag @ RLP2008 Serial 965  
Permanent link to this record
 

 
Author Oriol Ramos Terrades; Salvatore Tabbone; Ernest Valveny edit  openurl
  Title Optimal Linear Combination for Two-class Classifiers Type Conference Article
  Year 2007 Publication Proceedings of the International Conference on Advances in Pattern Recognition Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address Kolkata (India)  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference (down) ICAPR  
  Notes DAG Approved no  
  Call Number DAG @ dag @ RTV2007a Serial 894  
Permanent link to this record
 

 
Author Miquel Ferrer; Ernest Valveny; F. Serratosa edit  doi
isbn  openurl
  Title Median Graph Computation by means of a Genetic Approach Based on Minimum Common Supergraph and Maximum Common Subraph Type Conference Article
  Year 2009 Publication 4th Iberian Conference on Pattern Recognition and Image Analysis Abbreviated Journal  
  Volume 5524 Issue Pages 346–353  
  Keywords  
  Abstract Given a set of graphs, the median graph has been theoretically presented as a useful concept to infer a representative of the set. However, the computation of the median graph is a highly complex task and its practical application has been very limited up to now. In this work we present a new genetic algorithm for the median graph computation. A set of experiments on real data, where none of the existing algorithms for the median graph computation could be applied up to now due to their computational complexity, show that we obtain good approximations of the median graph. Finally, we use the median graph in a real nearest neighbour classification showing that it leaves the box of the only-theoretical concepts and demonstrating, from a practical point of view, that can be a useful tool to represent a set of graphs.  
  Address Póvoa de Varzim, Portugal  
  Corporate Author Thesis  
  Publisher Springer Berlin Heidelberg Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN 0302-9743 ISBN 978-3-642-02171-8 Medium  
  Area Expedition Conference (down) IbPRIA  
  Notes DAG Approved no  
  Call Number DAG @ dag @ FVS2009c Serial 1174  
Permanent link to this record
 

 
Author Albert Gordo; Ernest Valveny edit  doi
isbn  openurl
  Title The diagonal split: A pre-segmentation step for page layout analysis & classification Type Conference Article
  Year 2009 Publication 4th Iberian Conference on Pattern Recognition and Image Analysis Abbreviated Journal  
  Volume 5524 Issue Pages 290–297  
  Keywords  
  Abstract Document classification is an important task in all the processes related to document storage and retrieval. In the case of complex documents, structural features are needed to achieve a correct classification. Unfortunately, physical layout analysis is error prone. In this paper we present a pre-segmentation step based on a divide & conquer strategy that can be used to improve the page segmentation results, independently of the segmentation algorithm used. This pre-segmentation step is evaluated in classification and retrieval using the selective CRLA algorithm for layout segmentation together with a clustering based on the voronoi area diagram, and tested on two different databases, MARG and Girona Archives.  
  Address Póvoa de Varzim, Portugal  
  Corporate Author Thesis  
  Publisher Springer Berlin Heidelberg Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN 0302-9743 ISBN 978-3-642-02171-8 Medium  
  Area Expedition Conference (down) IbPRIA  
  Notes DAG Approved no  
  Call Number DAG @ dag @ Gov2009b Serial 1176  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: