Publicacions CVC -- Query Results

<< 1 2 3 4 5 6 7 8 9 10 >> [11–20]

Details

	Records
	Author	Francisco Cruz
	Title	Probabilistic Graphical Models for Document Analysis			Type	Book Whole
	Year	2016	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Latest advances in digitization techniques have fostered the interest in creating digital copies of collections of documents. Digitized documents permit an easy maintenance, loss-less storage, and efficient ways for transmission and to perform information retrieval processes. This situation has opened a new market niche to develop systems able to automatically extract and analyze information contained in these collections, specially in the ambit of the business activity. Due to the great variety of types of documents this is not a trivial task. For instance, the automatic extraction of numerical data from invoices differs substantially from a task of text recognition in historical documents. However, in order to extract the information of interest, is always necessary to identify the area of the document where it is located. In the area of Document Analysis we refer to this process as layout analysis, which aims at identifying and categorizing the different entities that compose the document, such as text regions, pictures, text lines, or tables, among others. To perform this task it is usually necessary to incorporate a prior knowledge about the task into the analysis process, which can be modeled by defining a set of contextual relations between the different entities of the document. The use of context has proven to be useful to reinforce the recognition process and improve the results on many computer vision tasks. It presents two fundamental questions: What kind of contextual information is appropriate for a given task, and how to incorporate this information into the models. In this thesis we study several ways to incorporate contextual information to the task of document layout analysis, and to the particular case of handwritten text line segmentation. We focus on the study of Probabilistic Graphical Models and other mechanisms for this purpose, and propose several solutions to these problems. First, we present a method for layout analysis based on Conditional Random Fields. With this model we encode local contextual relations between variables, such as pair-wise constraints. Besides, we encode a set of structural relations between different classes of regions at feature level. Second, we present a method based on 2D-Probabilistic Context-free Grammars to encode structural and hierarchical relations. We perform a comparative study between Probabilistic Graphical Models and this syntactic approach. Third, we propose a method for structured documents based on Bayesian Networks to represent the document structure, and an algorithm based in the Expectation-Maximization to find the best configuration of the page. We perform a thorough evaluation of the proposed methods on two particular collections of documents: a historical collection composed of ancient structured documents, and a collection of contemporary documents. In addition, we present a general method for the task of handwritten text line segmentation. We define a probabilistic framework where we combine the EM algorithm with variational approaches for computing inference and parameter learning on a Markov Random Field. We evaluate our method on several collections of documents, including a general dataset of annotated administrative documents. Results demonstrate the applicability of our method to real problems, and the contribution of the use of contextual information to this kind of problems.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Oriol Ramos Terrades
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-945373-2-5	Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	Admin @ si @ Cru2016			Serial	2861
Permanent link to this record



	Author	Hongxing Gao
	Title	Focused Structural Document Image Retrieval in Digital Mailroom Applications			Type	Book Whole
	Year	2015	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	In this work, we develop a generic framework that is able to handle the document retrieval problem in various scenarios such as searching for full page matches or retrieving the counterparts for specific document areas, focusing on their structural similarity or letting their visual resemblance to play a dominant role. Based on the spatial indexing technique, we propose to search for matches of local key-region pairs carrying both structural and visual information from the collection while a scheme allowing to adjust the relative contribution of structural and visual similarity is presented. Based on the fact that the structure of documents is tightly linked with the distance among their elements, we firstly introduce an efficient detector named Distance Transform based Maximally Stable Extremal Regions (DTMSER). We illustrate that this detector is able to efficiently extract the structure of a document image as a dendrogram (hierarchical tree) of multi-scale key-regions that roughly correspond to letters, words and paragraphs. We demonstrate that, without benefiting from the structure information, the key-regions extracted by the DTMSER algorithm achieve better results comparing with state-of-the-art methods while much less amount of key-regions are employed. We subsequently propose a pair-wise Bag of Words (BoW) framework to efficiently embed the explicit structure extracted by the DTMSER algorithm. We represent each document as a list of key-region pairs that correspond to the edges in the dendrogram where inclusion relationship is encoded. By employing those structural key-region pairs as the pooling elements for generating the histogram of features, the proposed method is able to encode the explicit inclusion relations into a BoW representation. The experimental results illustrate that the pair-wise BoW, powered by the embedded structural information, achieves remarkable improvement over the conventional BoW and spatial pyramidal BoW methods. To handle various retrieval scenarios in one framework, we propose to directly query a series of key-region pairs, carrying both structure and visual information, from the collection. We introduce the spatial indexing techniques to the document retrieval community to speed up the structural relationship computation for key-region pairs. We firstly test the proposed framework in a full page retrieval scenario where structurally similar matches are expected. In this case, the pair-wise querying method achieves notable improvement over the BoW and spatial pyramidal BoW frameworks. Furthermore, we illustrate that the proposed method is also able to handle focused retrieval situations where the queries are defined as a specific interesting partial areas of the images. We examine our method on two types of focused queries: structure-focused and exact queries. The experimental results show that, the proposed generic framework obtains nearly perfect precision on both types of focused queries while it is the first framework able to tackle structure-focused queries, setting a new state of the art in the field. Besides, we introduce a line verification method to check the spatial consistency among the matched key-region pairs. We propose a computationally efficient version of line verification through a two step implementation. We first compute tentative localizations of the query and subsequently employ them to divide the matched key-region pairs into several groups, then line verification is performed within each group while more precise bounding boxes are computed. We demonstrate that, comparing with the standard approach (based on RANSAC), the line verification proposed generally achieves much higher recall with slight loss on precision on specific queries.
	Address	January 2015
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Josep Llados;Dimosthenis Karatzas;Marçal Rusiñol
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-943427-0-7	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ Gao2015			Serial	2577
Permanent link to this record



	Author	Lluis Pere de las Heras
	Title	Relational Models for Visual Understanding of Graphical Documents. Application to Architectural Drawings.			Type	Book Whole
	Year	2014	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Graphical documents express complex concepts using a visual language. This language consists of a vocabulary (symbols) and a syntax (structural relations between symbols) that articulate a semantic meaning in a certain context. Therefore, the automatic interpretation by computers of these sort of documents entails three main steps: the detection of the symbols, the extraction of the structural relations between these symbols, and the modeling of the knowledge that permits the extraction of the semantics. Dierent domains in graphical documents include: architectural and engineering drawings, maps, owcharts, etc. Graphics Recognition in particular and Document Image Analysis in general are born from the industrial need of interpreting a massive amount of digitalized documents after the emergence of the scanner. Although many years have passed, the graphical document understanding problem still seems to be far from being solved. The main reason is that the vast majority of the systems in the literature focus on very specic problems, where the domain of the document dictates the implementation of the interpretation. As a result, it is dicult to reuse these strategies on dierent data and on dierent contexts, hindering thus the natural progress in the eld. In this thesis, we face the graphical document understanding problem by proposing several relational models at dierent levels that are designed from a generic perspective. Firstly, we introduce three dierent strategies for the detection of symbols. The first method tackles the problem structurally, wherein general knowledge of the domain guides the detection. The second is a statistical method that learns the graphical appearance of the symbols and easily adapts to the big variability of the problem. The third method is a combination of the previous two methods that inherits their respective strengths, i.e. copes the big variability and does not need annotated data. Secondly, we present two relational strategies that tackle the problem of the visual context extraction. The first one is a full bottom up method that heuristically searches in a graph representation the contextual relations between symbols. Contrarily, the second is syntactic method that models probabilistically the structure of the documents. It automatically learns the model, which guides the inference algorithm to encounter the best structural representation for a given input. Finally, we construct a knowledge-based model consisting of an ontological denition of the domain and real data. This model permits to perform contextual reasoning and to detect semantic inconsistencies within the data. We evaluate the suitability of the proposed contributions in the framework of floor plan interpretation. Since there is no standard in the modeling of these documents there exists an enormous notation variability from plan to plan in terms of vocabulary and syntax. Therefore, floor plan interpretation is a relevant task in the graphical document understanding problem. It is also worth to mention that we make freely available all the resources used in this thesis {the data, the tool used to generate the data, and the evaluation scripts{ with the aim of fostering research in the graphical document understanding task.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Gemma Sanchez
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-940902-8-8	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ Her2014			Serial	2574
Permanent link to this record



	Author	David Fernandez
	Title	Contextual Word Spotting in Historical Handwritten Documents			Type	Book Whole
	Year	2014	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	There are countless collections of historical documents in archives and libraries that contain plenty of valuable information for historians and researchers. The extraction of this information has become a central task among the Document Analysis researches and practitioners. There is an increasing interest to digital preserve and provide access to these kind of documents. But only the digitalization is not enough for the researchers. The extraction and/or indexation of information of this documents has had an increased interest among researchers. In many cases, and in particular in historical manuscripts, the full transcription of these documents is extremely dicult due the inherent deciencies: poor physical preservation, dierent writing styles, obsolete languages, etc. Word spotting has become a popular an ecient alternative to full transcription. It inherently involves a high level of degradation in the images. The search of words is holistically formulated as a visual search of a given query shape in a larger image, instead of recognising the input text and searching the query word with an ascii string comparison. But the performance of classical word spotting approaches depend on the degradation level of the images being unacceptable in many cases . In this thesis we have proposed a novel paradigm called contextual word spotting method that uses the contextual/semantic information to achieve acceptable results whereas classical word spotting does not reach. The contextual word spotting framework proposed in this thesis is a segmentation-based word spotting approach, so an ecient word segmentation is needed. Historical handwritten documents present some common diculties that can increase the diculties the extraction of the words. We have proposed a line segmentation approach that formulates the problem as nding the central part path in the area between two consecutive lines. This is solved as a graph traversal problem. A path nding algorithm is used to nd the optimal path in a graph, previously computed, between the text lines. Once the text lines are extracted, words are localized inside the text lines using a word segmentation technique from the state of the art. Classical word spotting approaches can be improved using the contextual information of the documents. We have introduced a new framework, oriented to handwritten documents that present a highly structure, to extract information making use of context. The framework is an ecient tool for semi-automatic transcription that uses the contextual information to achieve better results than classical word spotting approaches. The contextual information is automatically discovered by recognizing repetitive structures and categorizing all the words according to semantic classes. The most frequent words in each semantic cluster are extracted and the same text is used to transcribe all them. The experimental results achieved in this thesis outperform classical word spotting approaches demonstrating the suitability of the proposed ensemble architecture for spotting words in historical handwritten documents using contextual information.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Josep Llados;Alicia Fornes
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-940902-7-1	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ Fer2014			Serial	2573
Permanent link to this record



	Author	Antonio Clavelli
	Title	A computational model of eye guidance, searching for text in real scene images			Type	Book Whole
	Year	2014	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Searching for text objects in real scene images is an open problem and a very active computer vision research area. A large number of methods have been proposed tackling the text search as extension of the ones from the document analysis field or inspired by general purpose object detection methods. However the general problem of object search in real scene images remains an extremely challenging problem due to the huge variability in object appearance. This thesis builds on top of the most recent findings in the visual attention literature presenting a novel computational model of eye guidance aiming to better describe text object search in real scene images. First are presented the relevant state-of-the-art results from the visual attention literature regarding eye movements and visual search. Relevant models of attention are discussed and integrated with recent observations on the role of top-down constraints and the emerging need for a layered model of attention in which saliency is not the only factor guiding attention. Visual attention is then explained by the interaction of several modulating factors, such as objects, value, plans and saliency. Then we introduce our probabilistic formulation of attention deployment in real scene. The model is based on the rationale that oculomotor control depends on two interacting but distinct processes: an attentional process that assigns value to the sources of information and motor process that flexibly links information with action. In such framework, the choice of where to look next is task-dependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the reward of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. In the experimental section the model is tested in laboratory condition, comparing model simulations with data from eye tracking experiments. The comparison is qualitative in terms of observable scan paths and quantitative in terms of statistical similarity of gaze shift amplitude. Experiments are performed using eye tracking data from both a publicly available dataset of face and text and from newly performed eye-tracking experiments on a dataset of street view pictures containing text. The last part of this thesis is dedicated to study the extent to which the proposed model can account for human eye movements in a low constrained setting. We used a mobile eye tracking device and an ad-hoc developed methodology to compare model simulated eye data with the human eye data from mobile eye tracking recordings. Such setting allow to test the model in an incomplete visual information condition, reproducing a close to real-life search task.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Dimosthenis Karatzas;Giuseppe Boccignone;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-940902-6-4	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ Cla2014			Serial	2571
Permanent link to this record



	Author	Anjan Dutta
	Title	Inexact Subgraph Matching Applied to Symbol Spotting in Graphical Documents			Type	Book Whole
	Year	2014	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	There is a resurgence in the use of structural approaches in the usual object recognition and retrieval problem. Graph theory, in particular, graph matching plays a relevant role in that. Specifically, the detection of an object (or a part of that) in an image in terms of structural features can be formulated as a subgraph matching. Subgraph matching is a challenging task. Specially due to the presence of outliers most of the graph matching algorithms do not perform well in subgraph matching scenario. Also exact subgraph isomorphism has proven to be an NP-complete problem. So naturally, in graph matching community, there are lot of efforts addressing the problem of subgraph matching within suboptimal bound. Most of them work with approximate algorithms that try to get an inexact solution in estimated way. In addition, usual recognition must cope with distortion. Inexact graph matching consists in finding the best isomorphism under a similarity measure. Theoretically this thesis proposes algorithms for solving subgraph matching in an approximate and inexact way. We consider the symbol spotting problem on graphical documents or line drawings from application point of view. This is a well known problem in the graphics recognition community. It can be further applied for indexing and classification of documents based on their contents. The structural nature of this kind of documents easily motivates one for giving a graph based representation. So the symbol spotting problem on graphical documents can be considered as a subgraph matching problem. The main challenges in this application domain is the noise and distortions that might come during the usage, digitalization and raster to vector conversion of those documents. Apart from that computer vision nowadays is not any more confined within a limited number of images. So dealing a huge number of images with graph based method is a further challenge. In this thesis, on one hand, we have worked on efficient and robust graph representation to cope with the noise and distortions coming from documents. On the other hand, we have worked on different graph based methods and framework to solve the subgraph matching problem in a better approximated way, which can also deal with considerable number of images. Firstly, we propose a symbol spotting method by hashing serialized subgraphs. Graph serialization allows to create factorized substructures such as graph paths, which can be organized in hash tables depending on the structural similarities of the serialized subgraphs. The involvement of hashing techniques helps to reduce the search space substantially and speeds up the spotting procedure. Secondly, we introduce contextual similarities based on the walk based propagation on tensor product graph. These contextual similarities involve higher order information and more reliable than pairwise similarities. We use these higher order similarities to formulate subgraph matching as a node and edge selection problem in the tensor product graph. Thirdly, we propose near convex grouping to form near convex region adjacency graph which eliminates the limitations of traditional region adjacency graph representation for graphic recognition. Fourthly, we propose a hierarchical graph representation by simplifying/correcting the structural errors to create a hierarchical graph of the base graph. Later these hierarchical graph structures are matched with some graph matching methods. Apart from that, in this thesis we have provided an overall experimental comparison of all the methods and some of the state-of-the-art methods. Furthermore, some dataset models have also been proposed.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Josep Llados;Umapada Pal
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-940902-4-0	Medium
	Area		Expedition		Conference
	Notes	DAG; 600.077			Approved	no
	Call Number	Admin @ si @ Dut2014			Serial	2465
Permanent link to this record



	Author	Joan Mas
	Title	A Syntactic Pattern Recognition Approach based on a Distribution Tolerant Adjacency Grammar and a Spatial Indexed Parser. Application to Sketched Document Recognition			Type	Book Whole
	Year	2010	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	Sketch recognition is a discipline which has gained an increasing interest in the last 20 years. This is due to the appearance of new devices such as PDA, Tablet PC’s or digital pen & paper protocols. From the wide range of sketched documents we focus on those that represent structured documents such as: architectural floor-plans, engineering drawing, UML diagrams, etc. To recognize and understand these kinds of documents, first we have to recognize the different compounding symbols and then we have to identify the relations between these elements. From the way that a sketch is captured, there are two categories: on-line and off-line. On-line input modes refer to draw directly on a PDA or a Tablet PC’s while off-line input modes refer to scan a previously drawn sketch. This thesis is an overlapping of three different areas on Computer Science: Pattern Recognition, Document Analysis and Human-Computer Interaction. The aim of this thesis is to interpret sketched documents independently on whether they are captured on-line or off-line. For this reason, the proposed approach should contain the following features. First, as we are working with sketches the elements present in our input contain distortions. Second, as we would work in on-line or off-line input modes, the order in the input of the primitives is indifferent. Finally, the proposed method should be applied in real scenarios, its response time must be slow. To interpret a sketched document we propose a syntactic approach. A syntactic approach is composed of two correlated components: a grammar and a parser. The grammar allows describing the different elements on the document as well as their relations. The parser, given a document checks whether it belongs to the language generated by the grammar or not. Thus, the grammar should be able to cope with the distortions appearing on the instances of the elements. Moreover, it would be necessary to define a symbol independently of the order of their primitives. Concerning to the parser when analyzing 2D sentences, it does not assume an order in the primitives. Then, at each new primitive in the input, the parser searches among the previous analyzed symbols candidates to produce a valid reduction. Taking into account these features, we have proposed a grammar based on Adjacency Grammars. This kind of grammars defines their productions as a multiset of symbols rather than a list. This allows describing a symbol without an order in their components. To cope with distortion we have proposed a distortion model. This distortion model is an attributed estimated over the constraints of the grammar and passed through the productions. This measure gives an idea on how far is the symbol from its ideal model. In addition to the distortion on the constraints other distortions appear when working with sketches. These distortions are: overtracing, overlapping, gaps or spurious strokes. Some grammatical productions have been defined to cope with these errors. Concerning the recognition, we have proposed an incremental parser with an indexation mechanism. Incremental parsers analyze the input symbol by symbol given a response to the user when a primitive is analyzed. This makes incremental parser suitable to work in on-line as well as off-line input modes. The parser has been adapted with an indexation mechanism based on a spatial division. This indexation mechanism allows setting the primitives in the space and reducing the search to a neighbourhood. A third contribution is a grammatical inference algorithm. This method given a set of symbols captures the production describing it. In the field of formal languages, different approaches has been proposed but in the graphical domain not so much work is done in this field. The proposed method is able to capture the production from a set of symbol although they are drawn in different order. A matching step based on the Haussdorff distance and the Hungarian method has been proposed to match the primitives of the different symbols. In addition the proposed approach is able to capture the variability in the parameters of the constraints. From the experimental results, we may conclude that we have proposed a robust approach to describe and recognize sketches. Moreover, the addition of new symbols to the alphabet is not restricted to an expert. Finally, the proposed approach has been used in two real scenarios obtaining a good performance.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	Ediciones Graficas Rey	Place of Publication		Editor	Gemma Sanchez;Josep Llados
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-937261-4-0	Medium
	Area		Expedition		Conference
	Notes	DAG			Approved	no
	Call Number	DAG @ dag @ Mas2010			Serial	1334
Permanent link to this record



	Author	Debora Gil; Jordi Gonzalez; Gemma Sanchez (eds)
	Title	Computer Vision: Advances in Research and Development			Type	Book Whole
	Year	2007	Publication	Proceedings of the 2nd CVC International Workshop	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher	UAB	Place of Publication	Bellaterra (Spain)	Editor	Debora Gil; Jordi Gonzalez; Gemma Sanchez
	Language		Summary Language		Original Title
	Series Editor		Series Title	2	Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-935251-4-9	Medium
	Area		Expedition		Conference
	Notes	IAM; ISE; DAG			Approved	no
	Call Number	IAM @ iam @ GGS2007			Serial	1493
Permanent link to this record



	Author	Oriol Vicente; Alicia Fornes; Ramon Valdes
	Title	La Xarxa d Humanitats Digitals de la UABCie: una estructura inteligente para la investigación y la transferencia en Humanidades			Type	Conference Article
	Year	2017	Publication	3rd Congreso Internacional de Humanidades Digitales Hispánicas. Sociedad Internacional	Abbreviated Journal
	Volume		Issue		Pages	281-383
	Keywords
	Abstract
	Address
	Corporate Author				Thesis
	Publisher		Place of Publication		Editor
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-697-5692-8	Medium
	Area		Expedition		Conference	HDH
	Notes	DAG; 600.121			Approved	no
	Call Number	Admin @ si @ VFV2017			Serial	3060
Permanent link to this record



	Author	Arnau Baro
	Title	Reading Music Systems: From Deep Optical Music Recognition to Contextual Methods			Type	Book Whole
	Year	2022	Publication	PhD Thesis, Universitat Autonoma de Barcelona-CVC	Abbreviated Journal
	Volume		Issue		Pages
	Keywords
	Abstract	The transcription of sheet music into some machine-readable format can be carried out manually. However, the complexity of music notation inevitably leads to burdensome software for music score editing, which makes the whole process very time-consuming and prone to errors. Consequently, automatic transcription systems for musical documents represent interesting tools. Document analysis is the subject that deals with the extraction and processing of documents through image and pattern recognition. It is a branch of computer vision. Taking music scores as source, the field devoted to address this task is known as Optical Music Recognition (OMR). Typically, an OMR system takes an image of a music score and automatically extracts its content into some symbolic structure such as MEI or MusicXML. In this dissertation, we have investigated different methods for recognizing a single staff section (e.g. scores for violin, flute, etc.), much in the same way as most text recognition research focuses on recognizing words appearing in a given line image. These methods are based in two different methodologies. On the one hand, we present two methods based on Recurrent Neural Networks, in particular, the Long Short-Term Memory Neural Network. On the other hand, a method based on Sequence to Sequence models is detailed. Music context is needed to improve the OMR results, just like language models and dictionaries help in handwriting recognition. For example, syntactical rules and grammars could be easily defined to cope with the ambiguities in the rhythm. In music theory, for example, the time signature defines the amount of beats per bar unit. Thus, in the second part of this dissertation, different methodologies have been investigated to improve the OMR recognition. We have explored three different methods: (a) a graphic tree-structure representation, Dendrograms, that joins, at each level, its primitives following a set of rules, (b) the incorporation of Language Models to model the probability of a sequence of tokens, and (c) graph neural networks to analyze the music scores to avoid meaningless relationships between music primitives. Finally, to train all these methodologies, and given the method-specificity of the datasets in the literature, we have created four different music datasets. Two of them are synthetic with a modern or old handwritten appearance, whereas the other two are real handwritten scores, being one of them modern and the other old.
	Address
	Corporate Author				Thesis	Ph.D. thesis
	Publisher	IMPRIMA	Place of Publication		Editor	Alicia Fornes
	Language		Summary Language		Original Title
	Series Editor		Series Title		Abbreviated Series Title
	Series Volume		Series Issue		Edition
	ISSN		ISBN	978-84-124793-8-6	Medium
	Area		Expedition		Conference
	Notes	DAG;			Approved	no
	Call Number	Admin @ si @ Bar2022			Serial	3754
Permanent link to this record

Select All Deselect All

<< 1 2 3 4 5 6 7 8 9 10 >> [11–20]

List View

Citations

Details

All Found Records Selected Records:

Save Citations: Format:

Export Records: Format: