Home | << 1 2 3 4 >> |
Records | |||||
---|---|---|---|---|---|
Author | Juan Ramon Terven Salinas; Joaquin Salas; Bogdan Raducanu | ||||
Title | Robust Head Gestures Recognition for Assistive Technology | Type | Book Chapter | ||
Year | 2014 | Publication | Pattern Recognition | Abbreviated Journal | |
Volume | 8495 | Issue | Pages | 152-161 | |
Keywords | |||||
Abstract | This paper presents a system capable of recognizing six head gestures: nodding, shaking, turning right, turning left, looking up, and looking down. The main difference of our system compared to other methods is that the Hidden Markov Models presented in this paper, are fully connected and consider all possible states in any given order, providing the following advantages to the system: (1) allows unconstrained movement of the head and (2) it can be easily integrated into a wearable device (e.g. glasses, neck-hung devices), in which case it can robustly recognize gestures in the presence of ego-motion. Experimental results show that this approach outperforms common methods that use restricted HMMs for each gesture. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Springer International Publishing | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | LNCS | ||
Series Volume | Series Issue | Edition | |||
ISSN | 0302-9743 | ISBN | 978-3-319-07490-0 | Medium | |
Area | Expedition | Conference | |||
Notes | LAMP; | Approved | no | ||
Call Number | Admin @ si @ TSR2014b | Serial | 2505 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Dimosthenis Karatzas | ||||
Title | TextProposals: a Text‐specific Selective Search Algorithm for Word Spotting in the Wild | Type | Journal Article | ||
Year | 2017 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 70 | Issue | Pages | 60-74 | |
Keywords | |||||
Abstract | Motivated by the success of powerful while expensive techniques to recognize words in a holistic way (Goel et al., 2013; Almazán et al., 2014; Jaderberg et al., 2016) object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way.
Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers (Almazán et al., 2014; Jaderberg et al., 2016) shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10% F-score the best-performing method in the last ICDAR Robust Reading Competition (Karatzas, 2015). Source code of the complete end-to-end system is available at https://github.com/lluisgomez/TextProposals. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.084; 601.197; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ GoK2017 | Serial | 2886 | ||
Permanent link to this record | |||||
Author | Lluis Gomez; Anguelos Nicolaou; Dimosthenis Karatzas | ||||
Title | Improving patch‐based scene text script identification with ensembles of conjoined networks | Type | Journal Article | ||
Year | 2017 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 67 | Issue | Pages | 85-96 | |
Keywords | |||||
Abstract | |||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.084; 600.121; 600.129 | Approved | no | ||
Call Number | Admin @ si @ GNK2017 | Serial | 2887 | ||
Permanent link to this record | |||||
Author | Anjan Dutta; Josep Llados; Horst Bunke; Umapada Pal | ||||
Title | Product graph-based higher order contextual similarities for inexact subgraph matching | Type | Journal Article | ||
Year | 2018 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 76 | Issue | Pages | 596-611 | |
Keywords | |||||
Abstract | Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched. Pairwise measurements usually consider local attributes but disregard contextual information involved in graph structures. We address this issue by proposing contextual similarities between pairs of nodes. This is done by considering the tensor product graph (TPG) of two graphs to be matched, where each node is an ordered pair of nodes of the operand graphs. Contextual similarities between a pair of nodes are computed by accumulating weighted walks (normalized pairwise similarities) terminating at the corresponding paired node in TPG. Once the contextual similarities are obtained, we formulate subgraph matching as a node and edge selection problem in TPG. We use contextual similarities to construct an objective function and optimize it with a linear programming approach. Since random walk formulation through TPG takes into account higher order information, it is not a surprise that we obtain more reliable similarities and better discrimination among the nodes and edges. Experimental results shown on synthetic as well as real benchmarks illustrate that higher order contextual similarities increase discriminating power and allow one to find approximate solutions to the subgraph matching problem. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 602.167; 600.097; 600.121 | Approved | no | ||
Call Number | Admin @ si @ DLB2018 | Serial | 3083 | ||
Permanent link to this record | |||||
Author | Pau Rodriguez; Guillem Cucurull; Josep M. Gonfaus; Xavier Roca; Jordi Gonzalez | ||||
Title | Age and gender recognition in the wild with deep attention | Type | Journal Article | ||
Year | 2017 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 72 | Issue | Pages | 563-571 | |
Keywords | Age recognition; Gender recognition; Deep neural networks; Attention mechanisms | ||||
Abstract | Face analysis in images in the wild still pose a challenge for automatic age and gender recognition tasks, mainly due to their high variability in resolution, deformation, and occlusion. Although the performance has highly increased thanks to Convolutional Neural Networks (CNNs), it is still far from optimal when compared to other image recognition tasks, mainly because of the high sensitiveness of CNNs to facial variations. In this paper, inspired by biology and the recent success of attention mechanisms on visual question answering and fine-grained recognition, we propose a novel feedforward attention mechanism that is able to discover the most informative and reliable parts of a given face for improving age and gender classification. In particular, given a downsampled facial image, the proposed model is trained based on a novel end-to-end learning framework to extract the most discriminative patches from the original high-resolution image. Experimental validation on the standard Adience, Images of Groups, and MORPH II benchmarks show that including attention mechanisms enhances the performance of CNNs in terms of robustness and accuracy. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | ISE; 600.098; 602.133; 600.119 | Approved | no | ||
Call Number | Admin @ si @ RCG2017b | Serial | 2962 | ||
Permanent link to this record | |||||
Author | Sangheeta Roy; Palaiahnakote Shivakumara; Namita Jain; Vijeta Khare; Anjan Dutta; Umapada Pal; Tong Lu | ||||
Title | Rough-Fuzzy based Scene Categorization for Text Detection and Recognition in Video | Type | Journal Article | ||
Year | 2018 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 80 | Issue | Pages | 64-82 | |
Keywords | Rough set; Fuzzy set; Video categorization; Scene image classification; Video text detection; Video text recognition | ||||
Abstract | Scene image or video understanding is a challenging task especially when number of video types increases drastically with high variations in background and foreground. This paper proposes a new method for categorizing scene videos into different classes, namely, Animation, Outlet, Sports, e-Learning, Medical, Weather, Defense, Economics, Animal Planet and Technology, for the performance improvement of text detection and recognition, which is an effective approach for scene image or video understanding. For this purpose, at first, we present a new combination of rough and fuzzy concept to study irregular shapes of edge components in input scene videos, which helps to classify edge components into several groups. Next, the proposed method explores gradient direction information of each pixel in each edge component group to extract stroke based features by dividing each group into several intra and inter planes. We further extract correlation and covariance features to encode semantic features located inside planes or between planes. Features of intra and inter planes of groups are then concatenated to get a feature matrix. Finally, the feature matrix is verified with temporal frames and fed to a neural network for categorization. Experimental results show that the proposed method outperforms the existing state-of-the-art methods, at the same time, the performances of text detection and recognition methods are also improved significantly due to categorization. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.097; 600.121 | Approved | no | ||
Call Number | Admin @ si @ RSJ2018 | Serial | 3096 | ||
Permanent link to this record | |||||
Author | Jun Wan; Sergio Escalera; Francisco Perales; Josef Kittler | ||||
Title | Articulated Motion and Deformable Objects | Type | Journal Article | ||
Year | 2018 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 79 | Issue | Pages | 55-64 | |
Keywords | |||||
Abstract | This guest editorial introduces the twenty two papers accepted for this Special Issue on Articulated Motion and Deformable Objects (AMDO). They are grouped into four main categories within the field of AMDO: human motion analysis (action/gesture), human pose estimation, deformable shape segmentation, and face analysis. For each of the four topics, a survey of the recent developments in the field is presented. The accepted papers are briefly introduced in the context of this survey. They contribute novel methods, algorithms with improved performance as measured on benchmarking datasets, as well as two new datasets for hand action detection and human posture analysis. The special issue should be of high relevance to the reader interested in AMDO recognition and promote future research directions in the field. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HUPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ WEP2018 | Serial | 3126 | ||
Permanent link to this record | |||||
Author | Juan Ignacio Toledo; Manuel Carbonell; Alicia Fornes; Josep Llados | ||||
Title | Information Extraction from Historical Handwritten Document Images with a Context-aware Neural Model | Type | Journal Article | ||
Year | 2019 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 86 | Issue | Pages | 27-36 | |
Keywords | Document image analysis; Handwritten documents; Named entity recognition; Deep neural networks | ||||
Abstract | Many historical manuscripts that hold trustworthy memories of the past societies contain information organized in a structured layout (e.g. census, birth or marriage records). The precious information stored in these documents cannot be effectively used nor accessed without costly annotation efforts. The transcription driven by the semantic categories of words is crucial for the subsequent access. In this paper we describe an approach to extract information from structured historical handwritten text images and build a knowledge representation for the extraction of meaning out of historical data. The method extracts information, such as named entities, without the need of an intermediate transcription step, thanks to the incorporation of context information through language models. Our system has two variants, the first one is based on bigrams, whereas the second one is based on recurrent neural networks. Concretely, our second architecture integrates a Convolutional Neural Network to model visual information from word images together with a Bidirecitonal Long Short Term Memory network to model the relation among the words. This integrated sequential approach is able to extract more information than just the semantic category (e.g. a semantic category can be associated to a person in a record). Our system is generic, it deals with out-of-vocabulary words by design, and it can be applied to structured handwritten texts from different domains. The method has been validated with the ICDAR IEHHR competition protocol, outperforming the existing approaches. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.097; 601.311; 603.057; 600.084; 600.140; 600.121 | Approved | no | ||
Call Number | Admin @ si @ TCF2019 | Serial | 3166 | ||
Permanent link to this record | |||||
Author | Pau Riba; Lutz Goldmann; Oriol Ramos Terrades; Diede Rusticus; Alicia Fornes; Josep Llados | ||||
Title | Table detection in business document images by message passing networks | Type | Journal Article | ||
Year | 2022 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 127 | Issue | Pages | 108641 | |
Keywords | |||||
Abstract | Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches. | ||||
Address | July 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Elsevier | Place of Publication | Editor | ||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.162; 600.121 | Approved | no | ||
Call Number | Admin @ si @ RGR2022 | Serial | 3729 | ||
Permanent link to this record | |||||
Author | Carola Figueroa Flores; Abel Gonzalez-Garcia; Joost Van de Weijer; Bogdan Raducanu | ||||
Title | Saliency for fine-grained object recognition in domains with scarce training data | Type | Journal Article | ||
Year | 2019 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 94 | Issue | Pages | 62-73 | |
Keywords | |||||
Abstract | This paper investigates the role of saliency to improve the classification accuracy of a Convolutional Neural Network (CNN) for the case when scarce training data is available. Our approach consists in adding a saliency branch to an existing CNN architecture which is used to modulate the standard bottom-up visual features from the original image input, acting as an attentional mechanism that guides the feature extraction process. The main aim of the proposed approach is to enable the effective training of a fine-grained recognition model with limited training samples and to improve the performance on the task, thereby alleviating the need to annotate a large dataset. The vast majority of saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline. Our proposed pipeline allows to evaluate saliency methods for the high-level task of object recognition. We perform extensive experiments on various fine-grained datasets (Flowers, Birds, Cars, and Dogs) under different conditions and show that saliency can considerably improve the network’s performance, especially for the case of scarce training data. Furthermore, our experiments show that saliency methods that obtain improved saliency maps (as measured by traditional saliency benchmarks) also translate to saliency methods that yield improved performance gains when applied in an object recognition pipeline. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | LAMP; 600.109; 600.141; 600.120 | Approved | no | ||
Call Number | Admin @ si @ FGW2019 | Serial | 3264 | ||
Permanent link to this record | |||||
Author | Lei Kang; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol | ||||
Title | Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture | Type | Journal Article | ||
Year | 2021 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 112 | Issue | Pages | 107790 | |
Keywords | |||||
Abstract | Sequence-to-sequence models have recently become very popular for tackling
handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging problem. The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritten word recognition system. Thus, the bias between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this work, we introduce Candidate Fusion, a novel way to integrate an external language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two improvements. On the one hand, the sequence-to-sequence recognizer has the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided by the language model. On the other hand, the external language model has the ability to adapt itself to the training corpus and even learn the most commonly errors produced from the recognizer. Finally, by conducting comprehensive experiments, the Candidate Fusion proves to outperform the state-of-the-art language models for handwritten word recognition tasks. |
||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.140; 601.302; 601.312; 600.121 | Approved | no | ||
Call Number | Admin @ si @ KRV2021 | Serial | 3343 | ||
Permanent link to this record | |||||
Author | Estefania Talavera; Carolin Wuerich; Nicolai Petkov; Petia Radeva | ||||
Title | Topic modelling for routine discovery from egocentric photo-streams | Type | Journal Article | ||
Year | 2020 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 104 | Issue | Pages | 107330 | |
Keywords | Routine; Egocentric vision; Lifestyle; Behaviour analysis; Topic modelling | ||||
Abstract | Developing tools to understand and visualize lifestyle is of high interest when addressing the improvement of habits and well-being of people. Routine, defined as the usual things that a person does daily, helps describe the individuals’ lifestyle. With this paper, we are the first ones to address the development of novel tools for automatic discovery of routine days of an individual from his/her egocentric images. In the proposed model, sequences of images are firstly characterized by semantic labels detected by pre-trained CNNs. Then, these features are organized in temporal-semantic documents to later be embedded into a topic models space. Finally, Dynamic-Time-Warping and Spectral-Clustering methods are used for final day routine/non-routine discrimination. Moreover, we introduce a new EgoRoutine-dataset, a collection of 104 egocentric days with more than 100.000 images recorded by 7 users. Results show that routine can be discovered and behavioural patterns can be observed. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | MILAB; no proj | Approved | no | ||
Call Number | Admin @ si @ TWP2020 | Serial | 3435 | ||
Permanent link to this record | |||||
Author | Meysam Madadi; Hugo Bertiche; Sergio Escalera | ||||
Title | SMPLR: Deep learning based SMPL reverse for 3D human pose and shape recovery | Type | Journal Article | ||
Year | 2020 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 106 | Issue | Pages | 107472 | |
Keywords | Deep learning; 3D Human pose; Body shape; SMPL; Denoising autoencoder; Volumetric stack hourglass | ||||
Abstract | In this paper we propose to embed SMPL within a deep-based model to accurately estimate 3D pose and shape from a still RGB image. We use CNN-based 3D joint predictions as an intermediate representation to regress SMPL pose and shape parameters. Later, 3D joints are reconstructed again in the SMPL output. This module can be seen as an autoencoder where the encoder is a deep neural network and the decoder is SMPL model. We refer to this as SMPL reverse (SMPLR). By implementing SMPLR as an encoder-decoder we avoid the need of complex constraints on pose and shape. Furthermore, given that in-the-wild datasets usually lack accurate 3D annotations, it is desirable to lift 2D joints to 3D without pairing 3D annotations with RGB images. Therefore, we also propose a denoising autoencoder (DAE) module between CNN and SMPLR, able to lift 2D joints to 3D and partially recover from structured error. We evaluate our method on SURREAL and Human3.6M datasets, showing improvement over SMPL-based state-of-the-art alternatives by about 4 and 12 mm, respectively. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | HuPBA; no proj | Approved | no | ||
Call Number | Admin @ si @ MBE2020 | Serial | 3439 | ||
Permanent link to this record | |||||
Author | Andres Mafla; Ruben Tito; Sounak Dey; Lluis Gomez; Marçal Rusiñol; Ernest Valveny; Dimosthenis Karatzas | ||||
Title | Real-time Lexicon-free Scene Text Retrieval | Type | Journal Article | ||
Year | 2021 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 110 | Issue | Pages | 107656 | |
Keywords | |||||
Abstract | In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos. | ||||
Address | |||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121; 600.129; 601.338 | Approved | no | ||
Call Number | Admin @ si @ MTD2021 | Serial | 3493 | ||
Permanent link to this record | |||||
Author | Lei Kang; Pau Riba; Marçal Rusiñol; Alicia Fornes; Mauricio Villegas | ||||
Title | Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition | Type | Journal Article | ||
Year | 2022 | Publication | Pattern Recognition | Abbreviated Journal | PR |
Volume | 129 | Issue | Pages | 108766 | |
Keywords | |||||
Abstract | The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences of characters and words. However, using such recurrent paradigms comes at a cost at training stage, since their sequential pipelines prevent parallelization. In this work, we introduce a non-recurrent approach to recognize handwritten text by the use of transformer models. We propose a novel method that bypasses any recurrence. By using multi-head self-attention layers both at the visual and textual stages, we are able to tackle character recognition as well as to learn language-related dependencies of the character sequences to be decoded. Our model is unconstrained to any predefined vocabulary, being able to recognize out-of-vocabulary words, i.e. words that do not appear in the training vocabulary. We significantly advance over prior art and demonstrate that satisfactory recognition accuracies are yielded even in few-shot learning scenarios. | ||||
Address | Sept. 2022 | ||||
Corporate Author | Thesis | ||||
Publisher | Place of Publication | Editor | |||
Language | Summary Language | Original Title | |||
Series Editor | Series Title | Abbreviated Series Title | |||
Series Volume | Series Issue | Edition | |||
ISSN | ISBN | Medium | |||
Area | Expedition | Conference | |||
Notes | DAG; 600.121; 600.162 | Approved | no | ||
Call Number | Admin @ si @ KRR2022 | Serial | 3556 | ||
Permanent link to this record |