@PhdThesis{ArnauBaro2022, author="Arnau Baro", editor="Alicia Fornes", title="Reading Music Systems: From Deep Optical Music Recognition to Contextual Methods", year="2022", publisher="IMPRIMA", abstract="The transcription of sheet music into some machine-readable format can be carried out manually. However, the complexity of music notation inevitably leads to burdensome software for music score editing, which makes the whole processvery time-consuming and prone to errors. Consequently, automatic transcriptionsystems for musical documents represent interesting tools.Document analysis is the subject that deals with the extraction and processingof documents through image and pattern recognition. It is a branch of computervision. Taking music scores as source, the field devoted to address this task isknown as Optical Music Recognition (OMR). Typically, an OMR system takes animage of a music score and automatically extracts its content into some symbolicstructure such as MEI or MusicXML.In this dissertation, we have investigated different methods for recognizing asingle staff section (e.g. scores for violin, flute, etc.), much in the same way as most text recognition research focuses on recognizing words appearing in a given line image. These methods are based in two different methodologies. On the one hand, we present two methods based on Recurrent Neural Networks, in particular, theLong Short-Term Memory Neural Network. On the other hand, a method based on Sequence to Sequence models is detailed.Music context is needed to improve the OMR results, just like language modelsand dictionaries help in handwriting recognition. For example, syntactical rulesand grammars could be easily defined to cope with the ambiguities in the rhythm.In music theory, for example, the time signature defines the amount of beats perbar unit. Thus, in the second part of this dissertation, different methodologieshave been investigated to improve the OMR recognition. We have explored threedifferent methods: (a) a graphic tree-structure representation, Dendrograms, thatjoins, at each level, its primitives following a set of rules, (b) the incorporation of Language Models to model the probability of a sequence of tokens, and (c) graph neural networks to analyze the music scores to avoid meaningless relationships between music primitives.Finally, to train all these methodologies, and given the method-specificity ofthe datasets in the literature, we have created four different music datasets. Two of them are synthetic with a modern or old handwritten appearance, whereas theother two are real handwritten scores, being one of them modern and the otherold.", optnote="DAG;", optnote="exported from refbase (http://refbase.cvc.uab.es/show.php?record=3754), last updated on Thu, 23 Feb 2023 15:10:17 +0100", isbn="978-84-124793-8-6" }