|
Records |
Links |
|
Author |
Josep Llados; Daniel Lopresti; Seiichi Uchida (eds) |


|
|
Title |
16th International Conference, 2021, Proceedings, Part III |
Type |
Book Whole |
|
Year  |
2021 |
Publication |
Document Analysis and Recognition – ICDAR 2021 |
Abbreviated Journal |
|
|
|
Volume |
12823 |
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding. |
|
|
Address |
Lausanne, Switzerland, September 5-10, 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Cham |
Place of Publication |
|
Editor |
Josep Llados; Daniel Lopresti; Seiichi Uchida |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-3-030-86333-3 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ |
Serial |
3727 |
|
Permanent link to this record |
|
|
|
|
Author |
Josep Llados; Daniel Lopresti; Seiichi Uchida (eds) |


|
|
Title |
16th International Conference, 2021, Proceedings, Part IV |
Type |
Book Whole |
|
Year  |
2021 |
Publication |
Document Analysis and Recognition – ICDAR 2021 |
Abbreviated Journal |
|
|
|
Volume |
12824 |
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
This four-volume set of LNCS 12821, LNCS 12822, LNCS 12823 and LNCS 12824, constitutes the refereed proceedings of the 16th International Conference on Document Analysis and Recognition, ICDAR 2021, held in Lausanne, Switzerland in September 2021. The 182 full papers were carefully reviewed and selected from 340 submissions, and are presented with 13 competition reports.
The papers are organized into the following topical sections: document analysis for literature search, document summarization and translation, multimedia document analysis, mobile text recognition, document analysis for social good, indexing and retrieval of documents, physical and logical layout analysis, recognition of tables and formulas, and natural language processing (NLP) for document understanding. |
|
|
Address |
Lausanne, Switzerland, September 5-10, 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Cham |
Place of Publication |
|
Editor |
Josep Llados; Daniel Lopresti; Seiichi Uchida |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-3-030-86336-4 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ |
Serial |
3728 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Alicia Fornes; Y.Kessentini; C.Tudor |


|
|
Title |
A Few-shot Learning Approach for Historical Encoded Manuscript Recognition |
Type |
Conference Article |
|
Year  |
2021 |
Publication |
25th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
5413-5420 |
|
|
Keywords |
|
|
|
Abstract |
Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition. |
|
|
Address |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG; 600.121; 600.140 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SFK2021 |
Serial |
3449 |
|
Permanent link to this record |
|
|
|
|
Author |
Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas |


|
|
Title |
Multi-modal reasoning graph for scene-text based fine-grained image classification and retrieval |
Type |
Conference Article |
|
Year  |
2021 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
4022-4032 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MDB2021 |
Serial |
3491 |
|
Permanent link to this record |
|
|
|
|
Author |
Andres Mafla; Rafael S. Rezende; Lluis Gomez; Diana Larlus; Dimosthenis Karatzas |


|
|
Title |
StacMR: Scene-Text Aware Cross-Modal Retrieval |
Type |
Conference Article |
|
Year  |
2021 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
2219-2229 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MRG2021a |
Serial |
3492 |
|
Permanent link to this record |
|
|
|
|
Author |
Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas |

|
|
Title |
Real-time Lexicon-free Scene Text Retrieval |
Type |
Journal Article |
|
Year  |
2021 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
110 |
Issue |
|
Pages |
107656 |
|
|
Keywords |
|
|
|
Abstract |
In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.121; 600.129; 601.338 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MTD2021 |
Serial |
3493 |
|
Permanent link to this record |
|
|
|
|
Author |
Minesh Mathew; Dimosthenis Karatzas; C.V. Jawahar |

|
|
Title |
DocVQA: A Dataset for VQA on Document Images |
Type |
Conference Article |
|
Year  |
2021 |
Publication |
IEEE Winter Conference on Applications of Computer Vision |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
2200-2209 |
|
|
Keywords |
|
|
|
Abstract |
We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy). The models need to improve specifically on questions where understanding structure of the document is crucial. The dataset, code and leaderboard are available at docvqa. org |
|
|
Address |
Virtual; January 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
WACV |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MKJ2021 |
Serial |
3498 |
|
Permanent link to this record |
|
|
|
|
Author |
Adria Molina; Pau Riba; Lluis Gomez; Oriol Ramos Terrades; Josep Llados |


|
|
Title |
Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach |
Type |
Conference Article |
|
Year  |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
12822 |
Issue |
|
Pages |
306-320 |
|
|
Keywords |
|
|
|
Abstract |
This paper presents a novel method for date estimation of historical photographs from archival sources. The main contribution is to formulate the date estimation as a retrieval task, where given a query, the retrieved images are ranked in terms of the estimated date similarity. The closer are their embedded representations the closer are their dates. Contrary to the traditional models that design a neural network that learns a classifier or a regressor, we propose a learning objective based on the nDCG ranking metric. We have experimentally evaluated the performance of the method in two different tasks: date estimation and date-sensitive image retrieval, using the DEW public database, overcoming the baseline methods. |
|
|
Address |
Lausanne; Suissa; September 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
|
|
Call Number |
Admin @ si @ MRG2021b |
Serial |
3571 |
|
Permanent link to this record |
|
|
|
|
Author |
Pau Riba; Adria Molina; Lluis Gomez; Oriol Ramos Terrades; Josep Llados |


|
|
Title |
Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting |
Type |
Conference Article |
|
Year  |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
12822 |
Issue |
|
Pages |
381–395 |
|
|
Keywords |
|
|
|
Abstract |
In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the query string. We experimentally demonstrate the competitive performance of the proposed model on query-by-string word spotting for both, handwritten and real scene word images. We also provide the results for query-by-example word spotting, although it is not the main focus of this work. |
|
|
Address |
Lausanne; Suissa; September 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RMG2021 |
Serial |
3572 |
|
Permanent link to this record |
|
|
|
|
Author |
Sanket Biswas; Pau Riba; Josep Llados; Umapada Pal |


|
|
Title |
DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis |
Type |
Conference Article |
|
Year  |
2021 |
Publication |
16th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
12823 |
Issue |
|
Pages |
555–568 |
|
|
Keywords |
|
|
|
Abstract |
Despite significant progress on current state-of-the-art image generation models, synthesis of document images containing multiple and complex object layouts is a challenging task. This paper presents a novel approach, called DocSynth, to automatically synthesize document images based on a given layout. In this work, given a spatial layout (bounding boxes with object categories) as a reference by the user, our proposed DocSynth model learns to generate a set of realistic document images consistent with the defined layout. Also, this framework has been adapted to this work as a superior baseline model for creating synthetic document image datasets for augmenting real data during training for document layout analysis tasks. Different sets of learning objectives have been also used to improve the model performance. Quantitatively, we also compare the generated results of our model with real data using standard evaluation metrics. The results highlight that our model can successfully generate realistic and diverse document images with multiple objects. We also present a comprehensive qualitative analysis summary of the different scopes of synthetic image generation tasks. Lastly, to our knowledge this is the first work of its kind. |
|
|
Address |
Lausanne; Suissa; September 2021 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.121; 600.140; 110.312 |
Approved |
no |
|
|
Call Number |
Admin @ si @ BRL2021a |
Serial |
3573 |
|
Permanent link to this record |