|
Albert Suso, Pau Riba, Oriol Ramos Terrades and Josep Llados. 2021. A Self-supervised Inverse Graphics Approach for Sketch Parametrization. 16th International Conference on Document Analysis and Recognition.28–42. (LNCS.)
Abstract: The study of neural generative models of handwritten text and human sketches is a hot topic in the computer vision field. The landmark SketchRNN provided a breakthrough by sequentially generating sketches as a sequence of waypoints, and more recent articles have managed to generate fully vector sketches by coding the strokes as Bézier curves. However, the previous attempts with this approach need them all a ground truth consisting in the sequence of points that make up each stroke, which seriously limits the datasets the model is able to train in. In this work, we present a self-supervised end-to-end inverse graphics approach that learns to embed each image to its best fit of Bézier curves. The self-supervised nature of the training process allows us to train the model in a wider range of datasets, but also to perform better after-training predictions by applying an overfitting process on the input binary image. We report qualitative an quantitative evaluations on the MNIST and the Quick, Draw! datasets.
|
|
|
Sanket Biswas, Pau Riba, Josep Llados and Umapada Pal. 2021. Graph-Based Deep Generative Modelling for Document Layout Generation. 16th International Conference on Document Analysis and Recognition.525–537. (LNCS.)
Abstract: One of the major prerequisites for any deep learning approach is the availability of large-scale training data. When dealing with scanned document images in real world scenarios, the principal information of its content is stored in the layout itself. In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications. It is also the first graph-based approach for document layout generation task experimented on administrative document images, in this case, invoices.
|
|
|
Adria Molina, Lluis Gomez, Oriol Ramos Terrades and Josep Llados. 2022. A Generic Image Retrieval Method for Date Estimation of Historical Document Collections. Document Analysis Systems.15th IAPR International Workshop, (DAS2022).583–597.
Abstract: Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images.
Keywords: Date estimation; Document retrieval; Image retrieval; Ranking loss; Smooth-nDCG
|
|
|
Sergi Garcia Bordils and 6 others. 2022. Read While You Drive-Multilingual Text Tracking on the Road. 15th IAPR International workshop on document analysis systems.756–770. (LNCS.)
Abstract: Visual data obtained during driving scenarios usually contain large amounts of text that conveys semantic information necessary to analyse the urban environment and is integral to the traffic control plan. Yet, research on autonomous driving or driver assistance systems typically ignores this information. To advance research in this direction, we present RoadText-3K, a large driving video dataset with fully annotated text. RoadText-3K is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions and multiple languages and scripts. We offer a comprehensive analysis of tracking by detection and detection by tracking methods exploring the limits of state-of-the-art text detection. Finally, we propose a new end-to-end trainable tracking model that yields state-of-the-art results on this challenging dataset. Our experiments demonstrate the complexity and variability of RoadText-3K and establish a new, realistic benchmark for scene text tracking in the wild.
|
|
|
Asma Bensalah, Alicia Fornes, Cristina Carmona_Duarte and Josep Llados. 2022. Easing Automatic Neurorehabilitation via Classification and Smoothness Analysis. Intertwining Graphonomics with Human Movements. 20th International Conference of the International Graphonomics Society, IGS 2022.336–348. (LNCS.)
Abstract: Assessing the quality of movements for post-stroke patients during the rehabilitation phase is vital given that there is no standard stroke rehabilitation plan for all the patients. In fact, it depends basically on the patient’s functional independence and its progress along the rehabilitation sessions. To tackle this challenge and make neurorehabilitation more agile, we propose an automatic assessment pipeline that starts by recognising patients’ movements by means of a shallow deep learning architecture, then measuring the movement quality using jerk measure and related measures. A particularity of this work is that the dataset used is clinically relevant, since it represents movements inspired from Fugl-Meyer a well common upper-limb clinical stroke assessment scale for stroke patients. We show that it is possible to detect the contrast between healthy and patients movements in terms of smoothness, besides achieving conclusions about the patients’ progress during the rehabilitation sessions that correspond to the clinicians’ findings about each case.
Keywords: Neurorehabilitation; Upper-lim; Movement classification; Movement smoothness; Deep learning; Jerk
|
|
|
Alicia Fornes and 11 others. 2022. The RPM3D Project: 3D Kinematics for Remote Patient Monitoring. Intertwining Graphonomics with Human Movements. 20th International Conference of the International Graphonomics Society, IGS 2022.217–226. (LNCS.)
Abstract: This project explores the feasibility of remote patient monitoring based on the analysis of 3D movements captured with smartwatches. We base our analysis on the Kinematic Theory of Rapid Human Movement. We have validated our research in a real case scenario for stroke rehabilitation at the Guttmann Institute (https://www.guttmann.com/en/) (neurorehabilitation hospital), showing promising results. Our work could have a great impact in remote healthcare applications, improving the medical efficiency and reducing the healthcare costs. Future steps include more clinical validation, developing multi-modal analysis architectures (analysing data from sensors, images, audio, etc.), and exploring the application of our technology to monitor other neurodegenerative diseases.
Keywords: Healthcare applications; Kinematic; Theory of Rapid Human Movements; Human activity recognition; Stroke rehabilitation; 3D kinematics
|
|
|
Giuseppe De Gregorio and 6 others. 2022. A Few Shot Multi-representation Approach for N-Gram Spotting in Historical Manuscripts. Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022).3–12. (LNCS.)
Abstract: Despite recent advances in automatic text recognition, the performance remains moderate when it comes to historical manuscripts. This is mainly because of the scarcity of available labelled data to train the data-hungry Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS) provides a valid alternative to HTR due to the reduction in error rate, but it is usually limited to a closed reference vocabulary. In this paper, we propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram) that requires a small amount of labelled training data. We exhibit that recognition of important n-grams could reduce the system’s dependency on vocabulary. In this case, an out-of-vocabulary (OOV) word in an input handwritten line image could be a sequence of n-grams that belong to the lexicon. An extensive experimental evaluation of our proposed multi-representation approach was carried out on a subset of Bentham’s historical manuscript collections to obtain some really promising results in this direction.
Keywords: N-gram spotting; Few-shot learning; Multimodal understanding; Historical handwritten collections
|
|
|
Arnau Baro, Pau Riba and Alicia Fornes. 2022. Musigraph: Optical Music Recognition Through Object Detection and Graph Neural Network. Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022).171–184. (LNCS.)
Abstract: During the last decades, the performance of optical music recognition has been increasingly improving. However, and despite the 2-dimensional nature of music notation (e.g. notes have rhythm and pitch), most works treat musical scores as a sequence of symbols in one dimension, which make their recognition still a challenge. Thus, in this work we explore the use of graph neural networks for musical score recognition. First, because graphs are suited for n-dimensional representations, and second, because the combination of graphs with deep learning has shown a great performance in similar applications. Our methodology consists of: First, we will detect each isolated/atomic symbols (those that can not be decomposed in more graphical primitives) and the primitives that form a musical symbol. Then, we will build the graph taking as root node the notehead and as leaves those primitives or symbols that modify the note’s rhythm (stem, beam, flag) or pitch (flat, sharp, natural). Finally, the graph is translated into a human-readable character sequence for a final transcription and evaluation. Our method has been tested on more than five thousand measures, showing promising results.
Keywords: Object detection; Optical music recognition; Graph neural network
|
|
|
Utkarsh Porwal, Alicia Fornes and Faisal Shafait, eds. 2022. Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition. 18th International Conference, ICFHR 2022. Springer. (LNCS.)
|
|
|
Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas and Lluis Gomez. 2022. MUST-VQA: MUltilingual Scene-text VQA. Proceedings European Conference on Computer Vision Workshops.345–358. (LNCS.)
Abstract: In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question can be asked in different languages and it is not necessarily aligned to the scene text language. Thus, we first introduce a natural step towards a more generalized version of STVQA: MUST-VQA. Accounting for this, we discuss two evaluation scenarios in the constrained setting, namely IID and zero-shot and we demonstrate that the models can perform on a par on a zero-shot setting. We further provide extensive experimentation and show the effectiveness of adapting multilingual language models into STVQA tasks.
Keywords: Visual question answering; Scene text; Translation robustness; Multilingual models; Zero-shot transfer; Power of language models
|
|