|
Partha Pratim Roy, Umapada Pal and Josep Llados. 2008. Recognition of Multi-oriented Touching Characters in Graphical Documents. Computer Vision, Graphics & Image Processing, 2008. Sixth Indian Conference on,.297–304.
|
|
|
Ernest Valveny and Enric Marti. 1999. Recognition of lineal symbols in hand-written drawings using deformable template matching. Proceedings of the VIII Symposium Nacional de Reconocimiento de Formas y Análisis de Imágenes.
|
|
|
Nuria Cirera. 2012. Recognition of Handwritten Historical Documents. (Master's thesis, .)
|
|
|
Muhammad Muzzamil Luqman, Thierry Brouard, Jean-Yves Ramel and Josep Llados. 2012. Recherche de sous-graphes par encapsulation floue des cliques d'ordre 2: Application à la localisation de contenu dans les images de documents graphiques. Colloque International Francophone sur l'Écrit et le Document.149–162.
|
|
|
Antonio Lopez, Ernest Valveny and Juan J. Villanueva. 2005. Real-time quality control of surgical material packaging by artificial vision. Assembly Automation, 25(3).
|
|
|
Andres Mafla and 6 others. 2021. Real-time Lexicon-free Scene Text Retrieval. PR, 110, 107656.
Abstract: In this work, we address the task of scene text retrieval: given a text query, the system returns all images containing the queried text. The proposed model uses a single shot CNN architecture that predicts bounding boxes and builds a compact representation of spotted words. In this way, this problem can be modeled as a nearest neighbor search of the textual representation of a query over the outputs of the CNN collected from the totality of an image database. Our experiments demonstrate that the proposed model outperforms previous state-of-the-art, while offering a significant increase in processing speed and unmatched expressiveness with samples never seen at training time. Several experiments to assess the generalization capability of the model are conducted in a multilingual dataset, as well as an application of real-time text spotting in videos.
|
|
|
Jordi Vitria and 6 others. 1999. Real time recognition of pharmaceutical products by subspace methods.
|
|
|
A. Pujol and 6 others. 1999. Real time pharmaceutical product recognition using color and shape indexing. Proceedings of the 2nd International Workshop on European Scientific and Industrial Collaboration (WESIC´99), Promotoring Advanced Technologies in Manufacturing..
|
|
|
Leonardo Galteri and 7 others. 2017. Reading Text in the Wild from Compressed Images. 1st International workshop on Egocentric Perception, Interaction and Computing.
Abstract: Reading text in the wild is gaining attention in the computer vision community. Images captured in the wild are almost always compressed to varying degrees, depending on application context, and this compression introduces artifacts
that distort image content into the captured images. In this paper we investigate the impact these compression artifacts have on text localization and recognition in the wild. We also propose a deep Convolutional Neural Network (CNN) that can eliminate text-specific compression artifacts and which leads to an improvement in text recognition. Experimental results on the ICDAR-Challenge4 dataset demonstrate that compression artifacts have a significant
impact on text localization and recognition and that our approach yields an improvement in both – especially at high compression rates.
|
|
|
Arnau Baro. 2022. Reading Music Systems: From Deep Optical Music Recognition to Contextual Methods. (Ph.D. thesis, IMPRIMA.)
Abstract: The transcription of sheet music into some machine-readable format can be carried out manually. However, the complexity of music notation inevitably leads to burdensome software for music score editing, which makes the whole process
very time-consuming and prone to errors. Consequently, automatic transcription
systems for musical documents represent interesting tools.
Document analysis is the subject that deals with the extraction and processing
of documents through image and pattern recognition. It is a branch of computer
vision. Taking music scores as source, the field devoted to address this task is
known as Optical Music Recognition (OMR). Typically, an OMR system takes an
image of a music score and automatically extracts its content into some symbolic
structure such as MEI or MusicXML.
In this dissertation, we have investigated different methods for recognizing a
single staff section (e.g. scores for violin, flute, etc.), much in the same way as most text recognition research focuses on recognizing words appearing in a given line image. These methods are based in two different methodologies. On the one hand, we present two methods based on Recurrent Neural Networks, in particular, the
Long Short-Term Memory Neural Network. On the other hand, a method based on Sequence to Sequence models is detailed.
Music context is needed to improve the OMR results, just like language models
and dictionaries help in handwriting recognition. For example, syntactical rules
and grammars could be easily defined to cope with the ambiguities in the rhythm.
In music theory, for example, the time signature defines the amount of beats per
bar unit. Thus, in the second part of this dissertation, different methodologies
have been investigated to improve the OMR recognition. We have explored three
different methods: (a) a graphic tree-structure representation, Dendrograms, that
joins, at each level, its primitives following a set of rules, (b) the incorporation of Language Models to model the probability of a sequence of tokens, and (c) graph neural networks to analyze the music scores to avoid meaningless relationships between music primitives.
Finally, to train all these methodologies, and given the method-specificity of
the datasets in the literature, we have created four different music datasets. Two of them are synthetic with a modern or old handwritten appearance, whereas the
other two are real handwritten scores, being one of them modern and the other
old.
|
|