|
S. Chanda, Umapada Pal and Oriol Ramos Terrades. 2009. Word-Wise Thai and Roman Script Identification.
Abstract: In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme.
|
|
|
D. Perez, L. Tarazon, N. Serrano, F.M. Castro, Oriol Ramos Terrades and A. Juan. 2009. The GERMANA Database. 10th International Conference on Document Analysis and Recognition.301–305.
Abstract: A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. GERMANA is the result of digitising and annotating a 764-page Spanish manuscript from 1891, in which most pages only contain nearly calligraphed text written on ruled sheets of well-separated lines. To our knowledge, it is the first publicly available database for handwriting research, mostly written in Spanish and comparable in size to standard databases. Due to its sequential book structure, it is also well-suited for realistic assessment of interactive handwriting recognition systems. To provide baseline results for reference in future studies, empirical results are also reported, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.
|
|
|
L.Tarazon and 6 others. 2009. Confidence Measures for Error Correction in Interactive Transcription of Handwritten Text. 15th International Conference on Image Analysis and Processing. Springer Berlin Heidelberg, 567–574. (LNCS.)
Abstract: An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.
|
|
|
H. Chouaib, Oriol Ramos Terrades, Salvatore Tabbone, F. Cloppet and N. Vincent. 2008. Feature Selection Combining Genetic Algorithm and Adaboost Classifiers. 19th International Conference on Pattern Recognition.1–4.
|
|
|
T.O. Nguyen, Salvatore Tabbone and Oriol Ramos Terrades. 2008. Symbol Descriptor Based on Shape Context and Vector Model of Information Retrieval. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems,.191–197.
|
|
|
H. Chouaib, Salvatore Tabbone, Oriol Ramos Terrades, F. Cloppet, N. Vincent and A.T. Thierry Paquet. 2008. Sélection de Caractéristiques à partir d'un algorithme génétique et d'une combinaison de classifieurs Adaboost. Colloque International Francophone sur l'Ecrit et le Document.181–186.
|
|
|
T.O. Nguyen, Salvatore Tabbone, Oriol Ramos Terrades and A.T. Thierry. 2008. Proposition d'un descripteur de formes et du modèle vectoriel pour la recherche de symboles. Colloque International Francophone sur l'Ecrit et le Document.79–84.
|
|
|
Salvatore Tabbone, Oriol Ramos Terrades and S. Barrat. 2008. Histogram of radon transform. A useful descriptor for shape retrieval. 19th International Conference on Pattern Recognition.1–4.
|
|
|
V.C.Kieu, Alicia Fornes, M. Visani, N.Journet and Anjan Dutta. 2013. The ICDAR/GREC 2013 Music Scores Competition on Staff Removal. 10th IAPR International Workshop on Graphics Recognition.
Abstract: The first competition on music scores that was organized at ICDAR and GREC in 2011 awoke the interest of researchers, who participated both at staff removal and writer identification tasks. In this second edition, we propose a staff removal competition where we simulate old music scores. Thus, we have created a new set of images, which contain noise and 3D distortions. This paper describes the distortion methods, metrics, the participant’s methods and the obtained results.
Keywords: Competition; Music scores; Staff Removal
|
|
|
M. Visani, V.C.Kieu, Alicia Fornes and N.Journet. 2013. The ICDAR 2013 Music Scores Competition: Staff Removal. 12th International Conference on Document Analysis and Recognition.1439–1443.
Abstract: The first competition on music scores that was organized at ICDAR in 2011 awoke the interest of researchers, who participated both at staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario: old music scores. For this purpose, we have generated a new set of images using two kinds of degradations: local noise and 3D distortions. This paper describes the dataset, distortion methods, evaluation metrics, the participant's methods and the obtained results.
|
|