TY - CONF AU - Veronica Romero AU - Alicia Fornes AU - Enrique Vidal AU - Joan Andreu Sanchez A2 - IbPRIA ED - L.A. Alexandre ED - J.Salvador Sanchez ED - Joao M. F. Rodriguez PY - 2017// TI - Information Extraction in Handwritten Marriage Licenses Books Using the MGGI Methodology T2 - LNCS BT - 8th Iberian Conference on Pattern Recognition and Image Analysis SP - 287 EP - 294 VL - 10255 KW - Handwritten Text Recognition KW - Information extraction KW - Language modeling KW - MGGI KW - Categories-based language model N2 - Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demographic and genealogical research. For example, marriage license books have been used for centuries by ecclesiastical and secular institutions to register marriages. These books follow a simple structure of the text in the records with a evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. In previous works we studied the use of category-based language models and how a Grammatical Inference technique known as MGGI could improve the accuracy of these tasks. In this work we analyze the main causes of the semantic errors observed in previous results and apply a better implementation of the MGGI technique to solve these problems. Using the resulting language model, transcription and information extraction experiments have been carried out, and the results support our proposed approach. SN - 978-3-319-58837-7 L1 - http://refbase.cvc.uab.es/files/RFV2017.pdf N1 - DAG; 602.006; 600.097; 600.121 ID - Veronica Romero2017 ER -