|
Records |
Links |
|
Author |
Alloy Das; Sanket Biswas; Umapada Pal; Josep Llados |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Conference Article |
|
Year |
2024 |
Publication |
IEEE International Conference on Robotics and Automation in PACIFICO |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of domain-agnostic scene text spotting, i.e., training a model on multi-domain source data such that it can directly generalize to target domains rather than being specialized for a specific domain or scenario. In this regard, we present the community a text spotting validation benchmark called Under-Water Text (UWT) for noisy underwater scenes to establish an important case study. Moreover, we also design an efficient super-resolution based end-to-end transformer baseline called DA-TextSpotter which achieves comparable or superior performance over existing text spotting architectures for both regular and arbitrary-shaped scene text spotting benchmarks in terms of both accuracy and model efficiency. The dataset, code and pre-trained models will be released upon acceptance. |
|
|
Address |
Yokohama; Japan; May 2024 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICRA |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ DBP2024 |
Serial |
3979 |
|
Permanent link to this record |
|
|
|
|
Author |
Alloy Das; Sanket Biswas; Ayan Banerjee; Josep Llados; Umapada Pal; Saumik Bhattacharya |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Conference Article |
|
Year |
2024 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
718-728 |
|
|
Keywords |
|
|
|
Abstract |
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular and arbitrary-shaped scene text along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains (e.g. language, synth-to-real, and documents). both in terms of accuracy and efficiency. |
|
|
Address |
Waikoloa; Hawai; USA; January 2024 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ DBB2024 |
Serial |
3986 |
|
Permanent link to this record |
|
|
|
|
Author |
Subhajit Maity; Sanket Biswas; Siladittya Manna; Ayan Banerjee; Josep Llados; Saumik Bhattacharya; Umapada Pal |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![goto web page (via DOI) doi](http://refbase.cvc.uab.es/img/doi.gif)
|
|
Title |
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Conference Article |
|
Year |
2023 |
Publication |
17th International Conference on Doccument Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
14187 |
Issue |
|
Pages |
342–360 |
|
|
Keywords |
|
|
|
Abstract |
Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms. The code is made publicly available at: this https URL |
|
|
Address |
Document Layout Analysis; Document |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ MBM2023 |
Serial |
3990 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergi Garcia Bordils; Dimosthenis Karatzas; Marçal Rusiñol |
![download PDF file pdf](http://refbase.cvc.uab.es/img/file_PDF.gif)
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
STEP – Towards Structured Scene-Text Spotting |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Conference Article |
|
Year |
2024 |
Publication |
Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
883-892 |
|
|
Keywords |
|
|
|
Abstract |
We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression. Contrary to generic scene text OCR, structured scene-text spotting seeks to dynamically condition both scene text detection and recognition on user-provided regular expressions. To tackle this task, we propose the Structured TExt sPotter (STEP), a model that exploits the provided text structure to guide the OCR process. STEP is able to deal with regular expressions that contain spaces and it is not bound to detection at the word-level granularity. Our approach enables accurate zero-shot structured text spotting in a wide variety of real-world reading scenarios and is solely trained on publicly available data. To demonstrate the effectiveness of our approach, we introduce a new challenging test dataset that contains several types of out-of-vocabulary structured text, reflecting important reading applications of fields such as prices, dates, serial numbers, license plates etc. We demonstrate that STEP can provide specialised OCR performance on demand in all tested scenarios. |
|
|
Address |
Waikoloa; Hawai; USA; January 2024 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ GKR2024 |
Serial |
3992 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; Gemma Sanchez |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Graph Matching vs. Graph Parsing in Graphics Recognition: A Combined Approach |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Journal |
|
Year |
2004 |
Publication |
International Journal of Pattern Recognition and Artificial Intelligence |
Abbreviated Journal |
IJPRAI |
|
|
Volume |
18 |
Issue |
3 |
Pages |
455–473 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; IF: 0.588 |
Approved |
no |
|
|
Call Number |
DAG @ dag @ LlS2004 |
Serial |
445 |
|
Permanent link to this record |
|
|
|
|
Author |
Ernest Valveny; Philippe Dosch |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
A general framework for the evaluation of symbol recognition methods |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Journal |
|
Year |
2006 |
Publication |
International Journal on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ VaD2006 |
Serial |
686 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; Dorothea Blostein |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Special Issue on Graphics Recognition |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Journal |
|
Year |
2007 |
Publication |
International Journal on Document Analysis and Recognition |
Abbreviated Journal |
IJDAR |
|
|
Volume |
9 |
Issue |
1 |
Pages |
1–2 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Guest Editors |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ LlB2007 |
Serial |
781 |
|
Permanent link to this record |
|
|
|
|
Author |
Ernest Valveny; Philippe Dosch; Adam Winstanley; Yu Zhou; Su Yang; Luo Yan; Liu Wenyin; Dave Elliman; Mathieu Nicolas Delalandre; Eric Trupin; Sebastien Adam; Jean-Marc Ogier |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
A general framework for the evaluation of symbol recognition methods |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Journal |
|
Year |
2006 |
Publication |
International Journal on Document Analysis and Recognition (IJDAR), 9(1): 59–74 |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ VDW2006 |
Serial |
801 |
|
Permanent link to this record |
|
|
|
|
Author |
Gemma Sanchez; Alicia Fornes; Joan Mas; Josep Llados |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Computer Vision Tools for Visually Impaired Children Learning |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Journal |
|
Year |
2007 |
Publication |
|
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ SFM2007a |
Serial |
891 |
|
Permanent link to this record |
|
|
|
|
Author |
Gemma Sanchez; Alicia Fornes; Joan Mas; Josep Llados |
![find record details (via OpenURL) openurl](http://refbase.cvc.uab.es/img/xref.gif)
|
|
Title |
Computer Vision Tools for Visually Impaired Children Learning |
Type ![sorted by Type field, ascending order (up)](http://refbase.cvc.uab.es/img/sort_asc.gif) |
Journal |
|
Year |
2007 |
Publication |
|
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ SFM2007b |
Serial |
892 |
|
Permanent link to this record |