|   | 
Author Juan Ignacio Toledo; Manuel Carbonell; Alicia Fornes; Josep Llados
Title Information Extraction from Historical Handwritten Document Images with a Context-aware Neural Model Type Journal Article
Year 2019 Publication Pattern Recognition Abbreviated Journal (up) PR
Volume 86 Issue Pages 27-36
Keywords Document image analysis; Handwritten documents; Named entity recognition; Deep neural networks
Abstract Many historical manuscripts that hold trustworthy memories of the past societies contain information organized in a structured layout (e.g. census, birth or marriage records). The precious information stored in these documents cannot be effectively used nor accessed without costly annotation efforts. The transcription driven by the semantic categories of words is crucial for the subsequent access. In this paper we describe an approach to extract information from structured historical handwritten text images and build a knowledge representation for the extraction of meaning out of historical data. The method extracts information, such as named entities, without the need of an intermediate transcription step, thanks to the incorporation of context information through language models. Our system has two variants, the first one is based on bigrams, whereas the second one is based on recurrent neural networks. Concretely, our second architecture integrates a Convolutional Neural Network to model visual information from word images together with a Bidirecitonal Long Short Term Memory network to model the relation among the words. This integrated sequential approach is able to extract more information than just the semantic category (e.g. a semantic category can be associated to a person in a record). Our system is generic, it deals with out-of-vocabulary words by design, and it can be applied to structured handwritten texts from different domains. The method has been validated with the ICDAR IEHHR competition protocol, outperforming the existing approaches.
Corporate Author Thesis
Publisher Place of Publication Editor
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
Area Expedition Conference
Notes DAG; 600.097; 601.311; 603.057; 600.084; 600.140; 600.121 Approved no
Call Number Admin @ si @ TCF2019 Serial 3166
Permanent link to this record