%0 Conference Proceedings %T Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books %A Veronica Romero %A Alicia Fornes %A Enrique Vidal %A Joan Andreu Sanchez %B 15th international conference on Frontiers in Handwriting Recognition %D 2016 %F Veronica Romero2016 %O DAG; 600.097; 602.006 %O exported from refbase (http://refbase.cvc.uab.es/show.php?record=2909), last updated on Thu, 20 Jan 2022 09:02:59 +0100 %X Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies andgenealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previousworks we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach. %U http://refbase.cvc.uab.es/files/RFV2016.pdf