|
Manuel Carbonell. 2020. Neural Information Extraction from Semi-structured Documents A. (Ph.D. thesis, Ediciones Graficas Rey.)
Abstract: Sectors as fintech, legaltech or insurance process an inflow of millions of forms, invoices, id documents, claims or similar every day. Together with these, historical archives provide gigantic amounts of digitized documents containing useful information that needs to be stored in machine encoded text with a meaningful structure. This procedure, known as information extraction (IE) comprises the steps of localizing and recognizing text, identifying named entities contained in it and optionally finding relationships among its elements. In this work we explore multi-task neural models at image and graph level to solve all steps in a unified way. While doing so we find benefits and limitations of these end-to-end approaches in comparison with sequential separate methods. More specifically, we first propose a method to produce textual as well as semantic labels with a unified model from handwritten text line images. We do so with the use of a convolutional recurrent neural model trained with connectionist temporal classification to predict the textual as well as semantic information encoded in the images. Secondly, motivated by the success of this approach we investigate the unification of the localization and recognition tasks of handwritten text in full pages with an end-to-end model, observing benefits in doing so. Having two models that tackle information extraction subsequent task pairs in an end-to-end to end manner, we lastly contribute with a method to put them all together in a single neural network to solve the whole information extraction pipeline in a unified way. Doing so we observe some benefits and some limitations in the approach, suggesting that in certain cases it is beneficial to train specialized models that excel at a single challenging task of the information extraction process, as it can be the recognition of named entities or the extraction of relationships between them. For this reason we lastly study the use of the recently arrived graph neural network architectures for the semantic tasks of the information extraction process, which are recognition of named entities and relation extraction, achieving promising results on the relation extraction part.
|
|
|
Lluis Gomez, Anguelos Nicolaou, Marçal Rusiñol and Dimosthenis Karatzas. 2020. 12 years of ICDAR Robust Reading Competitions: The evolution of reading systems for unconstrained text understanding. In K. Alahari and C.V. Jawahar, eds. Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis. Springer. (Series on Advances in Computer Vision and Pattern Recognition.)
|
|
|
Lluis Gomez, Dena Bazazian and Dimosthenis Karatzas. 2020. Historical review of scene text detection research. In K. Alahari and C.V. Jawahar, eds. Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis. Springer. (Series on Advances in Computer Vision and Pattern Recognition.)
|
|
|
Jon Almazan, Lluis Gomez, Suman Ghosh, Ernest Valveny and Dimosthenis Karatzas. 2020. WATTS: A common representation of word images and strings using embedded attributes for text recognition and retrieval. In Analysis”, K.A. and C.V. Jawahar, eds. Visual Text Interpretation – Algorithms and Applications in Scene Understanding and Document Analysis. Springer. (Series on Advances in Computer Vision and Pattern Recognition.)
|
|
|
Minesh Mathew, Dimosthenis Karatzas and C.V. Jawahar. 2021. DocVQA: A Dataset for VQA on Document Images. IEEE Winter Conference on Applications of Computer Vision.2200–2209.
Abstract: We present a new dataset for Visual Question Answering (VQA) on document images called DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images. Detailed analysis of the dataset in comparison with similar datasets for VQA and reading comprehension is presented. We report several baseline results by adopting existing VQA and reading comprehension models. Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy). The models need to improve specifically on questions where understanding structure of the document is crucial. The dataset, code and leaderboard are available at docvqa. org
|
|
|
Manuel Carbonell, Pau Riba, Mauricio Villegas, Alicia Fornes and Josep Llados. 2020. Named Entity Recognition and Relation Extraction with Graph Neural Networks in Semi Structured Documents. 25th International Conference on Pattern Recognition.
Abstract: The use of administrative documents to communicate and leave record of business information requires of methods
able to automatically extract and understand the content from
such documents in a robust and efficient way. In addition,
the semi-structured nature of these reports is specially suited
for the use of graph-based representations which are flexible
enough to adapt to the deformations from the different document
templates. Moreover, Graph Neural Networks provide the proper
methodology to learn relations among the data elements in
these documents. In this work we study the use of Graph
Neural Network architectures to tackle the problem of entity
recognition and relation extraction in semi-structured documents.
Our approach achieves state of the art results in the three
tasks involved in the process. Additionally, the experimentation
with two datasets of different nature demonstrates the good
generalization ability of our approach.
|
|
|
Pau Torras, Arnau Baro, Lei Kang and Alicia Fornes. 2021. On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition. International Society for Music Information Retrieval Conference.690–696.
Abstract: Despite the latest advances in Deep Learning, the recognition of handwritten music scores is still a challenging endeavour. Even though the recent Sequence to Sequence(Seq2Seq) architectures have demonstrated its capacity to reliably recognise handwritten text, their performance is still far from satisfactory when applied to historical handwritten scores. Indeed, the ambiguous nature of handwriting, the non-standard musical notation employed by composers of the time and the decaying state of old paper make these scores remarkably difficult to read, sometimes even by trained humans. Thus, in this work we explore the incorporation of language models into a Seq2Seq-based architecture to try to improve transcriptions where the aforementioned unclear writing produces statistically unsound mistakes, which as far as we know, has never been attempted for this field of research on this architecture. After studying various Language Model integration techniques, the experimental evaluation on historical handwritten music scores shows a significant improvement over the state of the art, showing that this is a promising research direction for dealing with such difficult manuscripts.
|
|
|
Jialuo Chen, Mohamed Ali Souibgui, Alicia Fornes and Beata Megyesi. 2021. Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images. 4th International Conference on Historical Cryptology.34–37.
Abstract: Historical ciphers contain a wide range ofsymbols from various symbol sets. Iden-tifying the cipher alphabet is a prerequi-site before decryption can take place andis a time-consuming process. In this workwe explore the use of image processing foridentifying the underlying alphabet in ci-pher images, and to compare alphabets be-tween ciphers. The experiments show thatciphers with similar alphabets can be suc-cessfully discovered through clustering.
|
|
|
Pau Riba, Sounak Dey, Ali Furkan Biten and Josep Llados. 2021. Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild.
Abstract: This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images. In this cross-modal setting, we first contribute with a tough-to-beat baseline that without any specific SGOL training is able to outperform the previous works on a fixed set of classes. The baseline is useful to analyze the performance of SGOL approaches based on available simple yet powerful methods. We advance prior arts by proposing a sketch-conditioned DETR (DEtection TRansformer) architecture which avoids a hard classification and alleviates the domain gap between sketches and images to localize object instances. Although the main goal of SGOL is focused on object detection, we explored its natural extension to sketch-guided instance segmentation. This novel task allows to move towards identifying the objects at pixel level, which is of key importance in several applications. We experimentally demonstrate that our model and its variants significantly advance over previous state-of-the-art results. All training and testing code of our model will be released to facilitate future researchhttps://github.com/priba/sgol_wild.
|
|
|
Josep Llados. 2021. The 5G of Document Intelligence. 3rd Workshop on Future of Document Analysis and Recognition.
|
|