Records |
Author |
Arnau Baro; Pau Riba; Alicia Fornes |
Title |
Musigraph: Optical Music Recognition Through Object Detection and Graph Neural Network |
Type |
Conference Article |
Year |
2022 |
Publication |
Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition (ICFHR2022) |
Abbreviated Journal |
|
Volume |
13639 |
Issue |
|
Pages |
171-184 |
Keywords |
Object detection; Optical music recognition; Graph neural network |
Abstract |
During the last decades, the performance of optical music recognition has been increasingly improving. However, and despite the 2-dimensional nature of music notation (e.g. notes have rhythm and pitch), most works treat musical scores as a sequence of symbols in one dimension, which make their recognition still a challenge. Thus, in this work we explore the use of graph neural networks for musical score recognition. First, because graphs are suited for n-dimensional representations, and second, because the combination of graphs with deep learning has shown a great performance in similar applications. Our methodology consists of: First, we will detect each isolated/atomic symbols (those that can not be decomposed in more graphical primitives) and the primitives that form a musical symbol. Then, we will build the graph taking as root node the notehead and as leaves those primitives or symbols that modify the note’s rhythm (stem, beam, flag) or pitch (flat, sharp, natural). Finally, the graph is translated into a human-readable character sequence for a final transcription and evaluation. Our method has been tested on more than five thousand measures, showing promising results. |
Address |
December 04 – 07, 2022; Hyderabad, India |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ICFHR |
Notes |
DAG; 600.162; 600.140; 602.230 |
Approved |
no |
Call Number |
Admin @ si @ BRF2022b |
Serial |
3740 |
Permanent link to this record |
|
|
|
Author |
Utkarsh Porwal; Alicia Fornes; Faisal Shafait (eds) |
Title |
Frontiers in Handwriting Recognition. International Conference on Frontiers in Handwriting Recognition. 18th International Conference, ICFHR 2022 |
Type |
Book Whole |
Year |
2022 |
Publication |
Frontiers in Handwriting Recognition. |
Abbreviated Journal |
|
Volume |
13639 |
Issue |
|
Pages |
|
Keywords |
|
Abstract |
|
Address |
ICFHR 2022, Hyderabad, India, December 4–7, 2022 |
Corporate Author |
|
Thesis |
|
Publisher |
Springer |
Place of Publication |
|
Editor |
Utkarsh Porwal; Alicia Fornes; Faisal Shafait |
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
978-3-031-21648-0 |
Medium |
|
Area |
|
Expedition |
|
Conference |
ICFHR |
Notes |
DAG |
Approved |
no |
Call Number |
Admin @ si @ PFS2022 |
Serial |
3809 |
Permanent link to this record |
|
|
|
Author |
Emanuele Vivoli; Ali Furkan Biten; Andres Mafla; Dimosthenis Karatzas; Lluis Gomez |
Title |
MUST-VQA: MUltilingual Scene-text VQA |
Type |
Conference Article |
Year |
2022 |
Publication |
Proceedings European Conference on Computer Vision Workshops |
Abbreviated Journal |
|
Volume |
13804 |
Issue |
|
Pages |
345–358 |
Keywords |
Visual question answering; Scene text; Translation robustness; Multilingual models; Zero-shot transfer; Power of language models |
Abstract |
In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question can be asked in different languages and it is not necessarily aligned to the scene text language. Thus, we first introduce a natural step towards a more generalized version of STVQA: MUST-VQA. Accounting for this, we discuss two evaluation scenarios in the constrained setting, namely IID and zero-shot and we demonstrate that the models can perform on a par on a zero-shot setting. We further provide extensive experimentation and show the effectiveness of adapting multilingual language models into STVQA tasks. |
Address |
Tel-Aviv; Israel; October 2022 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ECCVW |
Notes |
DAG; 302.105; 600.155; 611.002 |
Approved |
no |
Call Number |
Admin @ si @ VBM2022 |
Serial |
3770 |
Permanent link to this record |
|
|
|
Author |
Sergi Garcia Bordils; Andres Mafla; Ali Furkan Biten; Oren Nuriel; Aviad Aberdam; Shai Mazor; Ron Litman; Dimosthenis Karatzas |
Title |
Out-of-Vocabulary Challenge Report |
Type |
Conference Article |
Year |
2022 |
Publication |
Proceedings European Conference on Computer Vision Workshops |
Abbreviated Journal |
|
Volume |
13804 |
Issue |
|
Pages |
359–375 |
Keywords |
|
Abstract |
This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge. The OOV contest introduces an important aspect that is not commonly studied by Optical Character Recognition (OCR) models, namely, the recognition of unseen scene text instances at training time. The competition compiles a collection of public scene text datasets comprising of 326,385 images with 4,864,405 scene text instances, thus covering a wide range of data distributions. A new and independent validation and test set is formed with scene text instances that are out of vocabulary at training time. The competition was structured in two tasks, end-to-end and cropped scene text recognition respectively. A thorough analysis of results from baselines and different participants is presented. Interestingly, current state-of-the-art models show a significant performance gap under the newly studied setting. We conclude that the OOV dataset proposed in this challenge will be an essential area to be explored in order to develop scene text models that achieve more robust and generalized predictions. |
Address |
Tel-Aviv; Israel; October 2022 |
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Area |
|
Expedition |
|
Conference |
ECCVW |
Notes |
DAG; 600.155; 302.105; 611.002 |
Approved |
no |
Call Number |
Admin @ si @ GMB2022 |
Serial |
3771 |
Permanent link to this record |
|
|
|
Author |
Andrea Gemelli; Sanket Biswas; Enrico Civitelli; Josep Llados; Simone Marinai |
Title |
Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks |
Type |
Conference Article |
Year |
2022 |
Publication |
17th European Conference on Computer Vision Workshops |
Abbreviated Journal |
|
Volume |
13804 |
Issue |
|
Pages |
329–344 |
Keywords |
|
Abstract |
Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
|
Place of Publication |
|
Editor |
|
Language |
|
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
978-3-031-25068-2 |
Medium |
|
Area |
|
Expedition |
|
Conference |
ECCV-TiE |
Notes |
DAG; 600.162; 600.140; 110.312 |
Approved |
no |
Call Number |
Admin @ si @ GBC2022 |
Serial |
3795 |
Permanent link to this record |