%0 Conference Proceedings %T Information Extraction in Handwritten Marriage Licenses Books %A Veronica Romero %A Emilio Granell %A Alicia Fornes %A Enrique Vidal %A Joan Andreu Sanchez %B 5th International Workshop on Historical Document Imaging and Processing %D 2019 %F Veronica Romero2019 %O DAG; 600.140; 600.121 %O exported from refbase (http://refbase.cvc.uab.es/show.php?record=3352), last updated on Tue, 24 Nov 2020 13:18:11 +0100 %X Handwritten marriage licenses books are characterized by a simple structure of the text in the records with an evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. Previous works have shown that the use of category-based language models and a Grammatical Inference technique known as MGGI can improve the accuracy of thesetasks. However, the application of the MGGI algorithm requires an a priori knowledge to label the words of the training strings, that is not always easy to obtain. In this paper we study how to automatically obtain the information required by the MGGI algorithm using a technique based on Confusion Networks. Using the resulting language model, full handwritten text recognition and information extraction experiments have been carried out with results supporting the proposed approach. %U file:///C:/Users/mmartin/Downloads/3352631.3352637.pdf %U http://refbase.cvc.uab.es/files/RGF2019.pdf %P 66-71