TY - CONF AU - Veronica Romero AU - Emilio Granell AU - Alicia Fornes AU - Enrique Vidal AU - Joan Andreu Sanchez A2 - HIP PY - 2019// TI - Information Extraction in Handwritten Marriage Licenses Books BT - 5th International Workshop on Historical Document Imaging and Processing SP - 66 EP - 71 N2 - Handwritten marriage licenses books are characterized by a simple structure of the text in the records with an evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. Previous works have shown that the use of category-based language models and a Grammatical Inference technique known as MGGI can improve the accuracy of thesetasks. However, the application of the MGGI algorithm requires an a priori knowledge to label the words of the training strings, that is not always easy to obtain. In this paper we study how to automatically obtain the information required by the MGGI algorithm using a technique based on Confusion Networks. Using the resulting language model, full handwritten text recognition and information extraction experiments have been carried out with results supporting the proposed approach. UR - file:///C:/Users/mmartin/Downloads/3352631.3352637.pdf L1 - http://refbase.cvc.uab.es/files/RGF2019.pdf N1 - DAG; 600.140; 600.121 ID - Veronica Romero2019 ER -