|
Records |
Links |
|
Author |
Mohamed Ali Souibgui; Pau Torras; Jialuo Chen; Alicia Fornes |
|
|
Title |
An Evaluation of Handwritten Text Recognition Methods for Historical Ciphered Manuscripts |
Type |
Conference Article |
|
Year |
2023 |
Publication |
7th International Workshop on Historical Document Imaging and Processing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
7-12 |
|
|
Keywords |
|
|
|
Abstract |
This paper investigates the effectiveness of different deep learning HTR families, including LSTM, Seq2Seq, and transformer-based approaches with self-supervised pretraining, in recognizing ciphered manuscripts from different historical periods and cultures. The goal is to identify the most suitable method or training techniques for recognizing ciphered manuscripts and to provide insights into the challenges and opportunities in this field of research. We evaluate the performance of these models on several datasets of ciphered manuscripts and discuss their results. This study contributes to the development of more accurate and efficient methods for recognizing historical manuscripts for the preservation and dissemination of our cultural heritage. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
HIP |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ STC2023 |
Serial |
3849 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Sanket Biswas; Andres Mafla; Ali Furkan Biten; Alicia Fornes; Yousri Kessentini; Josep Llados; Lluis Gomez; Dimosthenis Karatzas |
|
|
Title |
Text-DIAE: a self-supervised degradation invariant autoencoder for text recognition and document enhancement |
Type |
Conference Article |
|
Year |
2023 |
Publication |
Proceedings of the 37th AAAI Conference on Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
37 |
Issue |
2 |
Pages |
|
|
|
Keywords |
Representation Learning for Vision; CV Applications; CV Language and Vision; ML Unsupervised; Self-Supervised Learning |
|
|
Abstract |
In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labelled data. Each of the pretext objectives is specifically tailored for the final downstream tasks. We conduct several ablation experiments that confirm the design choice of the selected pretext tasks. Importantly, the proposed model does not exhibit limitations of previous state-of-the-art methods based on contrastive losses, while at the same time requiring substantially fewer data samples to converge. Finally, we demonstrate that our method surpasses the state-of-the-art in existing supervised and self-supervised settings in handwritten and scene text recognition and document image enhancement. Our code and trained models will be made publicly available at https://github.com/dali92002/SSL-OCR |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
AAAI |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ SBM2023 |
Serial |
3848 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Sanket Biswas; Sana Khamekhem Jemni; Yousri Kessentini; Alicia Fornes; Josep Llados; Umapada Pal |
|
|
Title |
DocEnTr: An End-to-End Document Image Enhancement Transformer |
Type |
Conference Article |
|
Year |
2022 |
Publication |
26th International Conference on Pattern Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1699-1705 |
|
|
Keywords |
Degradation; Head; Optical character recognition; Self-supervised learning; Benchmark testing; Transformers; Magnetic heads |
|
|
Abstract |
Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: https://github.com/dali92002/DocEnTR |
|
|
Address |
August 21-25, 2022 , Montréal Québec |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICPR |
|
|
Notes |
DAG; 600.121; 600.162; 602.230; 600.140 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SBJ2022 |
Serial |
3730 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Y.Kessentini |
|
|
Title |
DE-GAN: A Conditional Generative Adversarial Network for Document Enhancement |
Type |
Journal Article |
|
Year |
2022 |
Publication |
IEEE Transactions on Pattern Analysis and Machine Intelligence |
Abbreviated Journal |
TPAMI |
|
|
Volume |
44 |
Issue |
3 |
Pages |
1180-1191 |
|
|
Keywords |
|
|
|
Abstract |
Documents often exhibit various forms of degradation, which make it hard to be read and substantially deteriorate the performance of an OCR system. In this paper, we propose an effective end-to-end framework named Document Enhancement Generative Adversarial Networks (DE-GAN) that uses the conditional GANs (cGANs) to restore severely degraded document images. To the best of our knowledge, this practice has not been studied within the context of generative adversarial deep networks. We demonstrate that, in different tasks (document clean up, binarization, deblurring and watermark removal), DE-GAN can produce an enhanced version of the degraded document with a high quality. In addition, our approach provides consistent improvements compared to state-of-the-art methods over the widely used DIBCO 2013, DIBCO 2017 and H-DIBCO 2018 datasets, proving its ability to restore a degraded document image to its ideal condition. The obtained results on a wide variety of degradation reveal the flexibility of the proposed model to be exploited in other document enhancement problems. |
|
|
Address |
1 March 2022 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 602.230; 600.121; 600.140 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SoK2022 |
Serial |
3454 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohamed Ali Souibgui; Y.Kessentini; Alicia Fornes |
|
|
Title |
A conditional GAN based approach for distorted camera captured documents recovery |
Type |
Conference Article |
|
Year |
2020 |
Publication |
4th Mediterranean Conference on Pattern Recognition and Artificial Intelligence |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Virtual; December 2020 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
MedPRAI |
|
|
Notes |
DAG; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ SKF2020 |
Serial |
3450 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohammed Al Rawi; Dimosthenis Karatzas |
|
|
Title |
On the Labeling Correctness in Computer Vision Datasets |
Type |
Conference Article |
|
Year |
2018 |
Publication |
Proceedings of the Workshop on Interactive Adaptive Learning, co-located with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
Image datasets have heavily been used to build computer vision systems.
These datasets are either manually or automatically labeled, which is a
problem as both labeling methods are prone to errors. To investigate this problem, we use a majority voting ensemble that combines the results from several Convolutional Neural Networks (CNNs). Majority voting ensembles not only enhance the overall performance, but can also be used to estimate the confidence level of each sample. We also examined Softmax as another form to estimate posterior probability. We have designed various experiments with a range of different ensembles built from one or different, or temporal/snapshot CNNs, which have been trained multiple times stochastically. We analyzed CIFAR10, CIFAR100, EMNIST, and SVHN datasets and we found quite a few incorrect
labels, both in the training and testing sets. We also present detailed confidence analysis on these datasets and we found that the ensemble is better than the Softmax when used estimate the per-sample confidence. This work thus proposes an approach that can be used to scrutinize and verify the labeling of computer vision datasets, which can later be applied to weakly/semi-supervised learning. We propose a measure, based on the Odds-Ratio, to quantify how many of these incorrectly classified labels are actually incorrectly labeled and how many of these are confusing. The proposed methods are easily scalable to larger datasets, like ImageNet, LSUN and SUN, as each CNN instance is trained for 60 epochs; or even faster, by implementing a temporal (snapshot) ensemble. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ECML-PKDDW |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RaK2018 |
Serial |
3144 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohammed Al Rawi; Ernest Valveny |
|
|
Title |
Compact and Efficient Multitask Learning in Vision, Language and Speech |
Type |
Conference Article |
|
Year |
2019 |
Publication |
IEEE International Conference on Computer Vision Workshops |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
2933-2942 |
|
|
Keywords |
|
|
|
Abstract |
Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains. |
|
|
Address |
Seul; Korea; October 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICCVW |
|
|
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RaV2019 |
Serial |
3365 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohammed Al Rawi; Ernest Valveny; Dimosthenis Karatzas |
|
|
Title |
Can One Deep Learning Model Learn Script-Independent Multilingual Word-Spotting? |
Type |
Conference Article |
|
Year |
2019 |
Publication |
15th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
260-267 |
|
|
Keywords |
|
|
|
Abstract |
Word spotting has gained increased attention lately as it can be used to extract textual information from handwritten documents and scene-text images. Current word spotting approaches are designed to work on a single language and/or script. Building intelligent models that learn script-independent multilingual word-spotting is challenging due to the large variability of multilingual alphabets and symbols. We used ResNet-152 and the Pyramidal Histogram of Characters (PHOC) embedding to build a one-model script-independent multilingual word-spotting and we tested it on Latin, Arabic, and Bangla (Indian) languages. The one-model we propose performs on par with the multi-model language-specific word-spotting system, and thus, reduces the number of models needed for each script and/or language. |
|
|
Address |
Sydney; Australia; September 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.129; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RVK2019 |
Serial |
3337 |
|
Permanent link to this record |
|
|
|
|
Author |
Muhammad Muzzamil Luqman; Jean-Yves Ramel; Josep Llados |
|
|
Title |
Improving Fuzzy Multilevel Graph Embedding through Feature Selection Technique |
Type |
Conference Article |
|
Year |
2012 |
Publication |
Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop |
Abbreviated Journal |
|
|
|
Volume |
7626 |
Issue |
|
Pages |
243-253 |
|
|
Keywords |
|
|
|
Abstract |
Graphs are the most powerful, expressive and convenient data structures but there is a lack of efficient computational tools and algorithms for processing them. The embedding of graphs into numeric vector spaces permits them to access the state-of-the-art computational efficient statistical models and tools. In this paper we take forward our work on explicit graph embedding and present an improvement to our earlier proposed method, named “fuzzy multilevel graph embedding – FMGE”, through feature selection technique. FMGE achieves the embedding of attributed graphs into low dimensional vector spaces by performing a multilevel analysis of graphs and extracting a set of global, structural and elementary level features. Feature selection permits FMGE to select the subset of most discriminating features and to discard the confusing ones for underlying graph dataset. Experimental results for graph classification experimentation on IAM letter, GREC and fingerprint graph databases, show improvement in the performance of FMGE. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer Berlin Heidelberg |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
0302-9743 |
ISBN |
978-3-642-34165-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
SSPR&SPR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ LRL2012 |
Serial |
2381 |
|
Permanent link to this record |
|
|
|
|
Author |
Muhammad Muzzamil Luqman; Jean-Yves Ramel; Josep Llados |
|
|
Title |
Multilevel Analysis of Attributed Graphs for Explicit Graph Embedding in Vector Spaces |
Type |
Book Chapter |
|
Year |
2013 |
Publication |
Graph Embedding for Pattern Analysis |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1-26 |
|
|
Keywords |
|
|
|
Abstract |
Ability to recognize patterns is among the most crucial capabilities of human beings for their survival, which enables them to employ their sophisticated neural and cognitive systems [1], for processing complex audio, visual, smell, touch, and taste signals. Man is the most complex and the best existing system of pattern recognition. Without any explicit thinking, we continuously compare, classify, and identify huge amount of signal data everyday [2], starting from the time we get up in the morning till the last second we fall asleep. This includes recognizing the face of a friend in a crowd, a spoken word embedded in noise, the proper key to lock the door, smell of coffee, the voice of a favorite singer, the recognition of alphabetic characters, and millions of more tasks that we perform on regular basis. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer New York |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4614-4456-5 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ LRL2013b |
Serial |
2271 |
|
Permanent link to this record |