%0 Conference Proceedings %T Handwriting Recognition in Historical Documents using Very Large Vocabularies %A Volkmar Frinken %A Andreas Fischer %A Carlos David Martinez Hinarejos %B 2nd International Workshop on Historical Document Imaging and Processing %D 2013 %@ 978-1-4503-2115-0 %F Volkmar Frinken2013 %O DAG; 600.056; 600.045; 600.061; 602.006; 602.101 %O exported from refbase (http://refbase.cvc.uab.es/show.php?record=2296), last updated on Thu, 16 Feb 2023 12:06:04 +0100 %X Language models are used in automatic transcription system to resolve ambiguities. This is done by limiting the vocabulary of words that can be recognized as well as estimating the n-gram probability of the words in the given text. In the context of historical documents, a non-unified spelling and the limited amount of written text pose a substantial problem for the selection of the recognizable vocabulary as well as the computation of the word probabilities. In this paper we propose for the transcription of historical Spanish text to keep the corpus for the n-gram limited to a sample of the target text, but expand the vocabulary with words gathered from external resources. We analyze the performance of such a transcription system with different sizes of external vocabularies and demonstrate the applicability and the significant increase in recognition accuracy of using up to 300 thousand external words. %U http://refbase.cvc.uab.es/files/FFM2013.pdf %U http://dx.doi.org/10.1145/2501115.2501116 %P 67-72