|
Records |
Links |
|
Author |
Jon Almazan; Albert Gordo; Alicia Fornes; Ernest Valveny |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Handwritten Word Spotting with Corrected Attributes |
Type |
Conference Article |
|
Year |
2013 |
Publication |
15th IEEE International Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1017-1024 |
|
|
Keywords |
|
|
|
Abstract |
We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results. |
|
|
Address |
Sydney; Australia; December 2013 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1550-5499 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICCV |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ AGF2013 |
Serial |
2327 |
|
Permanent link to this record |
|
|
|
|
Author |
Ali Furkan Biten; Ruben Tito; Andres Mafla; Lluis Gomez; Marçal Rusiñol; C.V. Jawahar; Ernest Valveny; Dimosthenis Karatzas |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![goto web page (via DOI) doi](http://refbase.cvc.uab.es/img/doi.gif)
|
|
Title |
Scene Text Visual Question Answering |
Type |
Conference Article |
|
Year |
2019 |
Publication |
18th IEEE International Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
4291-4301 |
|
|
Keywords |
|
|
|
Abstract |
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting highlevel semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research. |
|
|
Address |
Seul; Corea; October 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICCV |
|
|
Notes |
DAG; 600.129; 600.135; 601.338; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BTM2019b |
Serial |
3285 |
|
Permanent link to this record |
|
|
|
|
Author |
Jordy Van Landeghem; Ruben Tito; Lukasz Borchmann; Michal Pietruszka; Pawel Joziak; Rafal Powalski; Dawid Jurkiewicz; Mickael Coustaty; Bertrand Anckaert; Ernest Valveny; Matthew Blaschko; Sien Moens; Tomasz Stanislawek |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Document Understanding Dataset and Evaluation (DUDE) |
Type |
Conference Article |
|
Year |
2023 |
Publication |
20th IEEE International Conference on Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
19528-19540 |
|
|
Keywords |
|
|
|
Abstract |
We call on the Document AI (DocAI) community to re-evaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks. Document Understanding Dataset and Evaluation (DUDE) seeks to remediate the halted research progress in understanding visually-rich documents (VRDs). We present a new dataset with novelties related to types of questions, answers, and document layouts based on multi-industry, multi-domain, and multi-page VRDs of various origins and dates. Moreover, we are pushing the boundaries of current methods by creating multi-task and multi-domain evaluation setups that more accurately simulate real-world situations where powerful generalization and adaptation under low-resource settings are desired. DUDE aims to set a new standard as a more practical, long-standing benchmark for the community, and we hope that it will lead to future extensions and contributions that address real-world challenges. Finally, our work illustrates the importance of finding more efficient ways to model language, images, and layout in DocAI. |
|
|
Address |
Paris; France; October 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICCV |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ LTB2023 |
Serial |
3948 |
|
Permanent link to this record |
|
|
|
|
Author |
Leonardo Galteri; Dena Bazazian; Lorenzo Seidenari; Marco Bertini; Andrew Bagdanov; Anguelos Nicolaou; Dimosthenis Karatzas; Alberto del Bimbo |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Reading Text in the Wild from Compressed Images |
Type |
Conference Article |
|
Year |
2017 |
Publication |
1st International workshop on Egocentric Perception, Interaction and Computing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Reading text in the wild is gaining attention in the computer vision community. Images captured in the wild are almost always compressed to varying degrees, depending on application context, and this compression introduces artifacts
that distort image content into the captured images. In this paper we investigate the impact these compression artifacts have on text localization and recognition in the wild. We also propose a deep Convolutional Neural Network (CNN) that can eliminate text-specific compression artifacts and which leads to an improvement in text recognition. Experimental results on the ICDAR-Challenge4 dataset demonstrate that compression artifacts have a significant
impact on text localization and recognition and that our approach yields an improvement in both – especially at high compression rates. |
|
|
Address |
Venice; Italy; October 2017 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICCV - EPIC |
|
|
Notes |
DAG; 600.084; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ GBS2017 |
Serial |
3006 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohammed Al Rawi; Ernest Valveny |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![goto web page (via DOI) doi](http://refbase.cvc.uab.es/img/doi.gif)
|
|
Title |
Compact and Efficient Multitask Learning in Vision, Language and Speech |
Type |
Conference Article |
|
Year |
2019 |
Publication |
IEEE International Conference on Computer Vision Workshops |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
2933-2942 |
|
|
Keywords |
|
|
|
Abstract |
Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains. |
|
|
Address |
Seul; Korea; October 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICCVW |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RaV2019 |
Serial |
3365 |
|
Permanent link to this record |
|
|
|
|
Author |
Soumya Jahagirdar; Minesh Mathew; Dimosthenis Karatzas; CV Jawahar |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Understanding Video Scenes Through Text: Insights from Text-Based Video Question Answering |
Type |
Conference Article |
|
Year |
2023 |
Publication |
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively. Particularly, comprehending text in videos holds great significance, requiring both scene text understanding and temporal reasoning. This paper focuses on exploring two recently introduced datasets, NewsVideoQA and M4-ViteVQA, which aim to address video question answering based on textual content. The NewsVideoQA dataset contains question-answer pairs related to the text in news videos, while M4- ViteVQA comprises question-answer pairs from diverse categories like vlogging, traveling, and shopping. We provide an analysis of the formulation of these datasets on various levels, exploring the degree of visual understanding and multi-frame comprehension required for answering the questions. Additionally, the study includes experimentation with BERT-QA, a text-only model, which demonstrates comparable performance to the original methods on both datasets, indicating the shortcomings in the formulation of these datasets. Furthermore, we also look into the domain adaptation aspect by examining the effectiveness of training on M4-ViteVQA and evaluating on NewsVideoQA and vice-versa, thereby shedding light on the challenges and potential benefits of out-of-domain training. |
|
|
Address |
Paris; France; October 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICCVW |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ JMK2023 |
Serial |
3946 |
|
Permanent link to this record |
|
|
|
|
Author |
Miquel Ferrer; Ernest Valveny |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Combination of OCR Engines for Page Segmentation based on Performance Evaluation |
Type |
Conference Article |
|
Year |
2007 |
Publication |
9th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
2 |
Issue |
|
Pages |
784–788 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Curitiba (Brazil) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ FeV2007 |
Serial |
838 |
|
Permanent link to this record |
|
|
|
|
Author |
Joan Mas; Gemma Sanchez; Josep Llados; B. Lamiroy |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
An Incremental On-line Parsing Algorithm for Recognizing Sketching Diagrams |
Type |
Conference Article |
|
Year |
2007 |
Publication |
9th IEEE International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
1 |
Issue |
|
Pages |
452–456 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Curitiba (Brazil) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ MSL2007a |
Serial |
847 |
|
Permanent link to this record |
|
|
|
|
Author |
Marçal Rusiñol; Josep Llados; Philippe Dosch |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Camera-Based Graphical Symbol Detection |
Type |
Conference Article |
|
Year |
2007 |
Publication |
9th IEEE International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
2 |
Issue |
|
Pages |
884–888 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Curitiba (Brazil) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RLD2007 |
Serial |
848 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; Gemma Sanchez |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Indexing Historical Documents by Word Shape Signatures |
Type |
Conference Article |
|
Year |
2007 |
Publication |
9th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
1 |
Issue |
|
Pages |
362–366 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Curitiba (Brasil) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference ![sorted by Conference field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ LlS2007 |
Serial |
882 |
|
Permanent link to this record |