Records |
Links |
Author  |
Ariel Amato; Angel Sappa; Alicia Fornes; Felipe Lumbreras; Josep Llados |

Title |
Divide and Conquer: Atomizing and Parallelizing A Task in A Mobile Crowdsourcing Platform |
Type |
Conference Article |
Year |
2013 |
Publication |
2nd International ACM Workshop on Crowdsourcing for Multimedia |
Abbreviated Journal |
Volume |
Issue |
Pages |
21-22 |
Keywords |
Abstract |
In this paper we present some conclusions about the advantages of having an efficient task formulation when a crowdsourcing platform is used. In particular we show how the task atomization and distribution can help to obtain results in an efficient way. Our proposal is based on a recursive splitting of the original task into a set of smaller and simpler tasks. As a result both more accurate and faster solutions are obtained. Our evaluation is performed on a set of ancient documents that need to be digitized. |
Address |
Barcelona; October 2013 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-1-4503-2396-3 |
Medium |
Area |
Expedition |
Conference |
CrowdMM |
Notes |
ADAS; ISE; DAG; 600.054; 600.055; 600.045; 600.061; 602.006 |
Approved |
no |
Call Number |
Admin @ si @ SLA2013 |
Serial |
2335 |
Permanent link to this record |
Author  |
Arka Ujjal Dey; Suman Ghosh; Ernest Valveny |

Title |
Don't only Feel Read: Using Scene text to understand advertisements |
Type |
Conference Article |
Year |
2018 |
Publication |
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
We propose a framework for automated classification of Advertisement Images, using not just Visual features but also Textual cues extracted from embedded text. Our approach takes inspiration from the assumption that Ad images contain meaningful textual content, that can provide discriminative semantic interpretetion, and can thus aid in classifcation tasks. To this end, we develop a framework using off-the-shelf components, and demonstrate the effectiveness of Textual cues in semantic Classfication tasks. |
Address |
Salt Lake City; Utah; USA; June 2018 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.121; 600.129 |
Approved |
no |
Call Number |
Admin @ si @ DGV2018 |
Serial |
3551 |
Permanent link to this record |
Author  |
Arka Ujjal Dey; Suman Ghosh; Ernest Valveny; Gaurav Harit |

Title |
Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding |
Type |
Journal Article |
Year |
2021 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
Volume |
149 |
Issue |
Pages |
164-171 |
Keywords |
Abstract |
Images with visual and scene text content are ubiquitous in everyday life. However, current image interpretation systems are mostly limited to using only the visual features, neglecting to leverage the scene text content. In this paper, we propose to jointly use scene text and visual channels for robust semantic interpretation of images. We do not only extract and encode visual and scene text cues, but also model their interplay to generate a contextual joint embedding with richer semantics. The contextual embedding thus generated is applied to retrieval and classification tasks on multimedia images, with scene text content, to demonstrate its effectiveness. In the retrieval framework, we augment our learned text-visual semantic representation with scene text cues, to mitigate vocabulary misses that may have occurred during the semantic embedding. To deal with irrelevant or erroneous recognition of scene text, we also apply query-based attention to our text channel. We show how the multi-channel approach, involving visual semantics and scene text, improves upon state of the art. |
Address |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ DGV2021 |
Serial |
3364 |
Permanent link to this record |
Author  |
Arnau Baro |

Title |
Reading Music Systems: From Deep Optical Music Recognition to Contextual Methods |
Type |
Book Whole |
Year |
2022 |
Publication |
PhD Thesis, Universitat Autonoma de Barcelona-CVC |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
The transcription of sheet music into some machine-readable format can be carried out manually. However, the complexity of music notation inevitably leads to burdensome software for music score editing, which makes the whole process
very time-consuming and prone to errors. Consequently, automatic transcription
systems for musical documents represent interesting tools.
Document analysis is the subject that deals with the extraction and processing
of documents through image and pattern recognition. It is a branch of computer
vision. Taking music scores as source, the field devoted to address this task is
known as Optical Music Recognition (OMR). Typically, an OMR system takes an
image of a music score and automatically extracts its content into some symbolic
structure such as MEI or MusicXML.
In this dissertation, we have investigated different methods for recognizing a
single staff section (e.g. scores for violin, flute, etc.), much in the same way as most text recognition research focuses on recognizing words appearing in a given line image. These methods are based in two different methodologies. On the one hand, we present two methods based on Recurrent Neural Networks, in particular, the
Long Short-Term Memory Neural Network. On the other hand, a method based on Sequence to Sequence models is detailed.
Music context is needed to improve the OMR results, just like language models
and dictionaries help in handwriting recognition. For example, syntactical rules
and grammars could be easily defined to cope with the ambiguities in the rhythm.
In music theory, for example, the time signature defines the amount of beats per
bar unit. Thus, in the second part of this dissertation, different methodologies
have been investigated to improve the OMR recognition. We have explored three
different methods: (a) a graphic tree-structure representation, Dendrograms, that
joins, at each level, its primitives following a set of rules, (b) the incorporation of Language Models to model the probability of a sequence of tokens, and (c) graph neural networks to analyze the music scores to avoid meaningless relationships between music primitives.
Finally, to train all these methodologies, and given the method-specificity of
the datasets in the literature, we have created four different music datasets. Two of them are synthetic with a modern or old handwritten appearance, whereas the
other two are real handwritten scores, being one of them modern and the other
old. |
Address |
Corporate Author |
Thesis |
Ph.D. thesis |
Publisher |
Place of Publication |
Editor |
Alicia Fornes |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
978-84-124793-8-6 |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; |
Approved |
no |
Call Number |
Admin @ si @ Bar2022 |
Serial |
3754 |
Permanent link to this record |
Author  |
Arnau Baro; Alicia Fornes; Carles Badal |

Title |
Handwritten Historical Music Recognition by Sequence-to-Sequence with Attention Mechanism |
Type |
Conference Article |
Year |
2020 |
Publication |
17th International Conference on Frontiers in Handwriting Recognition |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks. |
Address |
Virtual ICFHR; September 2020 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.140; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ BFB2020 |
Serial |
3448 |
Permanent link to this record |
Author  |
Arnau Baro; Carles Badal; Pau Torras; Alicia Fornes |

Title |
Handwritten Historical Music Recognition through Sequence-to-Sequence with Attention Mechanism |
Type |
Conference Article |
Year |
2022 |
Publication |
3rd International Workshop on Reading Music Systems (WoRMS2021) |
Abbreviated Journal |
Volume |
Issue |
Pages |
55-59 |
Keywords |
Optical Music Recognition; Digits; Image Classification |
Abstract |
Despite decades of research in Optical Music Recognition (OMR), the recognition of old handwritten music scores remains a challenge because of the variabilities in the handwriting styles, paper degradation, lack of standard notation, etc. Therefore, the research in OMR systems adapted to the particularities of old manuscripts is crucial to accelerate the conversion of music scores existing in archives into digital libraries, fostering the dissemination and preservation of our music heritage. In this paper we explore the adaptation of sequence-to-sequence models with attention mechanism (used in translation and handwritten text recognition) and the generation of specific synthetic data for recognizing old music scores. The experimental validation demonstrates that our approach is promising, especially when compared with long short-term memory neural networks. |
Address |
July 23, 2021, Alicante (Spain) |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.121; 600.162; 602.230; 600.140 |
Approved |
no |
Call Number |
Admin @ si @ BBT2022 |
Serial |
3734 |
Permanent link to this record |
Author  |
Arnau Baro; Jialuo Chen; Alicia Fornes; Beata Megyesi |

Title |
Towards a generic unsupervised method for transcription of encoded manuscripts |
Type |
Conference Article |
Year |
2019 |
Publication |
3rd International Conference on Digital Access to Textual Cultural Heritage |
Abbreviated Journal |
Volume |
Issue |
Pages |
73-78 |
Keywords |
A. Baró, J. Chen, A. Fornés, B. Megyesi. |
Abstract |
Historical ciphers, a special type of manuscripts, contain encrypted information, important for the interpretation of our history. The first step towards decipherment is to transcribe the images, either manually or by automatic image processing techniques. Despite the improvements in handwritten text recognition (HTR) thanks to deep learning methodologies, the need of labelled data to train is an important limitation. Given that ciphers often use symbol sets across various alphabets and unique symbols without any transcription scheme available, these supervised HTR techniques are not suitable to transcribe ciphers. In this paper we propose an un-supervised method for transcribing encrypted manuscripts based on clustering and label propagation, which has been successfully applied to community detection in networks. We analyze the performance on ciphers with various symbol sets, and discuss the advantages and drawbacks compared to supervised HTR methods. |
Address |
Brussels; May 2019 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.097; 600.140; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ BCF2019 |
Serial |
3276 |
Permanent link to this record |
Author  |
Arnau Baro; Pau Riba; Alicia Fornes |

Title |
Towards the recognition of compound music notes in handwritten music scores |
Type |
Conference Article |
Year |
2016 |
Publication |
15th international conference on Frontiers in Handwriting Recognition |
Abbreviated Journal |
Volume |
Issue |
Pages |
Keywords |
Abstract |
The recognition of handwritten music scores still remains an open problem. The existing approaches can only deal with very simple handwritten scores mainly because of the variability in the handwriting style and the variability in the composition of groups of music notes (i.e. compound music notes). In this work we focus on this second problem and propose a method based on perceptual grouping for the recognition of compound music notes. Our method has been tested using several handwritten music scores of the CVC-MUSCIMA database and compared with a commercial Optical Music Recognition (OMR) software. Given that our method is learning-free, the obtained results are promising. |
Address |
Shenzhen; China; October 2016 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
2167-6445 |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.097 |
Approved |
no |
Call Number |
Admin @ si @ BRF2016 |
Serial |
2903 |
Permanent link to this record |
Author  |
Arnau Baro; Pau Riba; Alicia Fornes |

Title |
A Starting Point for Handwritten Music Recognition |
Type |
Conference Article |
Year |
2018 |
Publication |
1st International Workshop on Reading Music Systems |
Abbreviated Journal |
Volume |
Issue |
Pages |
5-6 |
Keywords |
Optical Music Recognition; Long Short-Term Memory; Convolutional Neural Networks; MUSCIMA++; CVCMUSCIMA |
Abstract |
In the last years, the interest in Optical Music Recognition (OMR) has reawakened, especially since the appearance of deep learning. However, there are very few works addressing handwritten scores. In this work we describe a full OMR pipeline for handwritten music scores by using Convolutional and Recurrent Neural Networks that could serve as a baseline for the research community. |
Address |
Paris; France; September 2018 |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.097; 601.302; 601.330; 600.121 |
Approved |
no |
Call Number |
Admin @ si @ BRF2018 |
Serial |
3223 |
Permanent link to this record |
Author  |
Arnau Baro; Pau Riba; Alicia Fornes |

Title |
Musigraph: Optical Music Recognition Through Object Detection and Graph Neural Network |
Type |
Conference Article |
Year |
2022 |
Publication |
Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022) |
Abbreviated Journal |
Volume |
13639 |
Issue |
Pages |
171-184 |
Keywords |
Object detection; Optical music recognition; Graph neural network |
Abstract |
During the last decades, the performance of optical music recognition has been increasingly improving. However, and despite the 2-dimensional nature of music notation (e.g. notes have rhythm and pitch), most works treat musical scores as a sequence of symbols in one dimension, which make their recognition still a challenge. Thus, in this work we explore the use of graph neural networks for musical score recognition. First, because graphs are suited for n-dimensional representations, and second, because the combination of graphs with deep learning has shown a great performance in similar applications. Our methodology consists of: First, we will detect each isolated/atomic symbols (those that can not be decomposed in more graphical primitives) and the primitives that form a musical symbol. Then, we will build the graph taking as root node the notehead and as leaves those primitives or symbols that modify the note’s rhythm (stem, beam, flag) or pitch (flat, sharp, natural). Finally, the graph is translated into a human-readable character sequence for a final transcription and evaluation. Our method has been tested on more than five thousand measures, showing promising results. |
Address |
December 04 – 07, 2022; Hyderabad, India |
Corporate Author |
Thesis |
Publisher |
Place of Publication |
Editor |
Language |
Summary Language |
Original Title |
Series Editor |
Series Title |
Abbreviated Series Title |
Series Volume |
Series Issue |
Edition |
Medium |
Area |
Expedition |
Conference |
Notes |
DAG; 600.162; 600.140; 602.230 |
Approved |
no |
Call Number |
Admin @ si @ BRF2022b |
Serial |
3740 |
Permanent link to this record |