|
Records |
Links |
|
Author |
Adria Rico; Alicia Fornes |


|
|
Title  |
Camera-based Optical Music Recognition using a Convolutional Neural Network |
Type |
Conference Article |
|
Year |
2017 |
Publication |
12th IAPR International Workshop on Graphics Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
27-28 |
|
|
Keywords |
optical music recognition; document analysis; convolutional neural network; deep learning |
|
|
Abstract |
Optical Music Recognition (OMR) consists in recognizing images of music scores. Contrary to expectation, the current OMR systems usually fail when recognizing images of scores captured by digital cameras and smartphones. In this work, we propose a camera-based OMR system based on Convolutional Neural Networks, showing promising preliminary results |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
GREC |
|
|
Notes |
DAG;600.097; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RiF2017 |
Serial |
3059 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohammed Al Rawi; Ernest Valveny; Dimosthenis Karatzas |


|
|
Title  |
Can One Deep Learning Model Learn Script-Independent Multilingual Word-Spotting? |
Type |
Conference Article |
|
Year |
2019 |
Publication |
15th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
260-267 |
|
|
Keywords |
|
|
|
Abstract |
Word spotting has gained increased attention lately as it can be used to extract textual information from handwritten documents and scene-text images. Current word spotting approaches are designed to work on a single language and/or script. Building intelligent models that learn script-independent multilingual word-spotting is challenging due to the large variability of multilingual alphabets and symbols. We used ResNet-152 and the Pyramidal Histogram of Characters (PHOC) embedding to build a one-model script-independent multilingual word-spotting and we tested it on Latin, Arabic, and Bangla (Indian) languages. The one-model we propose performs on par with the multi-model language-specific word-spotting system, and thus, reduces the number of models needed for each script and/or language. |
|
|
Address |
Sydney; Australia; September 2019 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG; 600.129; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ RVK2019 |
Serial |
3337 |
|
Permanent link to this record |
|
|
|
|
Author |
Adarsh Tiwari; Sanket Biswas; Josep Llados |

|
|
Title  |
Can Pre-trained Language Models Help in Understanding Handwritten Symbols? |
Type |
Conference Article |
|
Year |
2023 |
Publication |
17th International Conference on Document Analysis and Recognition |
Abbreviated Journal |
|
|
|
Volume |
14193 |
Issue |
|
Pages |
199–211 |
|
|
Keywords |
|
|
|
Abstract |
The emergence of transformer models like BERT, GPT-2, GPT-3, RoBERTa, T5 for natural language understanding tasks has opened the floodgates towards solving a wide array of machine learning tasks in other modalities like images, audio, music, sketches and so on. These language models are domain-agnostic and as a result could be applied to 1-D sequences of any kind. However, the key challenge lies in bridging the modality gap so that they could generate strong features beneficial for out-of-domain tasks. This work focuses on leveraging the power of such pre-trained language models and discusses the challenges in predicting challenging handwritten symbols and alphabets. |
|
|
Address |
San Jose; CA; USA; August 2023 |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICDAR |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ TBL2023 |
Serial |
3908 |
|
Permanent link to this record |
|
|
|
|
Author |
Lei Kang; Pau Riba; Mauricio Villegas; Alicia Fornes; Marçal Rusiñol |


|
|
Title  |
Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture |
Type |
Journal Article |
|
Year |
2021 |
Publication |
Pattern Recognition |
Abbreviated Journal |
PR |
|
|
Volume |
112 |
Issue |
|
Pages |
107790 |
|
|
Keywords |
|
|
|
Abstract |
Sequence-to-sequence models have recently become very popular for tackling
handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging
problem. The main challenge faced when training a language model is to
deal with the language model corpus which is usually different to the one
used for training the handwritten word recognition system. Thus, the bias
between both word corpora leads to incorrectness on the transcriptions, providing similar or even worse performances on the recognition task. In this
work, we introduce Candidate Fusion, a novel way to integrate an external
language model to a sequence-to-sequence architecture. Moreover, it provides suggestions from an external language knowledge, as a new input to
the sequence-to-sequence recognizer. Hence, Candidate Fusion provides two
improvements. On the one hand, the sequence-to-sequence recognizer has
the flexibility not only to combine the information from itself and the language model, but also to choose the importance of the information provided
by the language model. On the other hand, the external language model
has the ability to adapt itself to the training corpus and even learn the
most commonly errors produced from the recognizer. Finally, by conducting
comprehensive experiments, the Candidate Fusion proves to outperform the
state-of-the-art language models for handwritten word recognition tasks. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.140; 601.302; 601.312; 600.121 |
Approved |
no |
|
|
Call Number |
Admin @ si @ KRV2021 |
Serial |
3343 |
|
Permanent link to this record |
|
|
|
|
Author |
Jose Antonio Rodriguez; Gemma Sanchez; Josep Llados |

|
|
Title  |
Categorization of Digital Ink Elements using Spectral Features |
Type |
Conference Article |
|
Year |
2007 |
Publication |
Seventh IAPR International Workshop on Graphics Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
63–64 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
Curitiba (Brazil) |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
GREC |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RSL2007c |
Serial |
888 |
|
Permanent link to this record |
|
|
|
|
Author |
Jose Antonio Rodriguez; Gemma Sanchez; Josep Llados |

|
|
Title  |
Categorization of Digital Ink Elements using Spectral Features |
Type |
Book Chapter |
|
Year |
2008 |
Publication |
Graphics Recognition: Recent Advances and New Opportunities |
Abbreviated Journal |
|
|
|
Volume |
5046 |
Issue |
|
Pages |
188–198 |
|
|
Keywords |
|
|
|
Abstract |
|
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer–Verlag |
Place of Publication |
|
Editor |
W. Liu, J. Llados, J.M. Ogier |
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
LNCS |
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
DAG @ dag @ RSL2008 |
Serial |
1099 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera; Alicia Fornes; Oriol Pujol; Josep Llados; Petia Radeva |

|
|
Title  |
Circular Blurred Shape Model for Multiclass Symbol Recognition |
Type |
Journal Article |
|
Year |
2011 |
Publication |
IEEE Transactions on Systems, Man and Cybernetics (Part B) (IEEE) |
Abbreviated Journal |
TSMCB |
|
|
Volume |
41 |
Issue |
2 |
Pages |
497-506 |
|
|
Keywords |
|
|
|
Abstract |
In this paper, we propose a circular blurred shape model descriptor to deal with the problem of symbol detection and classification as a particular case of object recognition. The feature extraction is performed by capturing the spatial arrangement of significant object characteristics in a correlogram structure. The shape information from objects is shared among correlogram regions, where a prior blurring degree defines the level of distortion allowed in the symbol, making the descriptor tolerant to irregular deformations. Moreover, the descriptor is rotation invariant by definition. We validate the effectiveness of the proposed descriptor in both the multiclass symbol recognition and symbol detection domains. In order to perform the symbol detection, the descriptors are learned using a cascade of classifiers. In the case of multiclass categorization, the new feature space is learned using a set of binary classifiers which are embedded in an error-correcting output code design. The results over four symbol data sets show the significant improvements of the proposed descriptor compared to the state-of-the-art descriptors. In particular, the results are even more significant in those cases where the symbols suffer from elastic deformations. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1083-4419 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB; DAG;HuPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ EFP2011 |
Serial |
1784 |
|
Permanent link to this record |
|
|
|
|
Author |
Sergio Escalera; Alicia Fornes; Oriol Pujol; Alberto Escudero; Petia Radeva |


|
|
Title  |
Circular Blurred Shape Model for Symbol Spotting in Documents |
Type |
Conference Article |
|
Year |
2009 |
Publication |
16th IEEE International Conference on Image Processing |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
1985-1988 |
|
|
Keywords |
|
|
|
Abstract |
Symbol spotting problem requires feature extraction strategies able to generalize from training samples and to localize the target object while discarding most part of the image. In the case of document analysis, symbol spotting techniques have to deal with a high variability of symbols' appearance. In this paper, we propose the Circular Blurred Shape Model descriptor. Feature extraction is performed capturing the spatial arrangement of significant object characteristics in a correlogram structure. Shape information from objects is shared among correlogram regions, being tolerant to the irregular deformations. Descriptors are learnt using a cascade of classifiers and Abadoost as the base classifier. Finally, symbol spotting is performed by means of a windowing strategy using the learnt cascade over plan and old musical score documents. Spotting and multi-class categorization results show better performance comparing with the state-of-the-art descriptors. |
|
|
Address |
Cairo, Egypt |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
978-1-4244-5653-6 |
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
ICIP |
|
|
Notes |
MILAB;HuPBA;DAG |
Approved |
no |
|
|
Call Number |
BCNPCL @ bcnpcl @ EFP2009b |
Serial |
1184 |
|
Permanent link to this record |
|
|
|
|
Author |
Marçal Rusiñol |

|
|
Title  |
Classificació semàntica i visual de documents digitals |
Type |
Journal |
|
Year |
2019 |
Publication |
Revista de biblioteconomia i documentacio |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
75-86 |
|
|
Keywords |
|
|
|
Abstract |
Se analizan los sistemas de procesamiento automático que trabajan sobre documentos digitalizados con el objetivo de describir los contenidos. De esta forma contribuyen a facilitar el acceso, permitir la indización automática y hacer accesibles los documentos a los motores de búsqueda. El objetivo de estas tecnologías es poder entrenar modelos computacionales que sean capaces de clasificar, agrupar o realizar búsquedas sobre documentos digitales. Así, se describen las tareas de clasificación, agrupamiento y búsqueda. Cuando utilizamos tecnologías de inteligencia artificial en los sistemas de
clasificación esperamos que la herramienta nos devuelva etiquetas semánticas; en sistemas de agrupamiento que nos devuelva documentos agrupados en clusters significativos; y en sistemas de búsqueda esperamos que dada una consulta, nos devuelva una lista ordenada de documentos en función de la relevancia. A continuación se da una visión de conjunto de los métodos que nos permiten describir los documentos digitales, tanto de manera visual (cuál es su apariencia), como a partir de sus contenidos semánticos (de qué hablan). En cuanto a la descripción visual de documentos se aborda el estado de la cuestión de las representaciones numéricas de documentos digitalizados
tanto por métodos clásicos como por métodos basados en el aprendizaje profundo (deep learning). Respecto de la descripción semántica de los contenidos se analizan técnicas como el reconocimiento óptico de caracteres (OCR); el cálculo de estadísticas básicas sobre la aparición de las diferentes palabras en un texto (bag-of-words model); y los métodos basados en aprendizaje profundo como el método word2vec, basado en una red neuronal que, dadas unas cuantas palabras de un texto, debe predecir cuál será la
siguiente palabra. Desde el campo de las ingenierías se están transfiriendo conocimientos que se han integrado en productos o servicios en los ámbitos de la archivística, la biblioteconomía, la documentación y las plataformas de gran consumo, sin embargo los algoritmos deben ser lo suficientemente eficientes no sólo para el reconocimiento y transcripción literal sino también para la capacidad de interpretación de los contenidos. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
DAG; 600.084; 600.135; 600.121; 600.129 |
Approved |
no |
|
|
Call Number |
Admin @ si @ Rus2019 |
Serial |
3282 |
|
Permanent link to this record |
|
|
|
|
Author |
Marçal Rusiñol; V. Poulain d'Andecy; Dimosthenis Karatzas; Josep Llados |

|
|
Title  |
Classification of Administrative Document Images by Logo Identification |
Type |
Conference Article |
|
Year |
2011 |
Publication |
In proceedings of 9th IAPR Workshop on Graphic Recognition |
Abbreviated Journal |
|
|
|
Volume |
|
Issue |
|
Pages |
|
|
|
Keywords |
|
|
|
Abstract |
This paper is focused on the categorization of administrative document images (such as invoices) based on the recognition of the supplier's graphical logo. Two different methods are proposed, the first one uses a bag-of-visual-words model whereas the second one tries to locate logo images described by the blurred shape model descriptor within documents by a sliding-window technique. Preliminar results are reported with a dataset of real administrative documents. |
|
|
Address |
Seoul, Corea |
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
GREC |
|
|
Notes |
DAG |
Approved |
no |
|
|
Call Number |
Admin @ si @ RPK2011 |
Serial |
1821 |
|
Permanent link to this record |