TY  - CONF
AU  - Mohamed Ali Souibgui
AU  - Sanket Biswas
AU  - Andres Mafla
AU  - Ali Furkan Biten
AU  - Alicia Fornes
AU  - Yousri Kessentini
AU  - Josep Llados
AU  - Lluis Gomez
AU  - Dimosthenis Karatzas
A2  - AAAI
PY  - 2023//
TI  - Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement
BT  - Proceedings of the 37th AAAI Conference on Artificial Intelligence
VL  - 37
IS  - 2
KW  - Representation Learning for Vision
KW  - CV Applications
KW  - CV Language and Vision
KW  - ML Unsupervised
KW  - Self-Supervised Learning
N2  - In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR
UR  - https://doi.org/10.1609/aaai.v37i2.25328
N1  - DAG
ID  - Mohamed Ali Souibgui2023
ER  -