|   | 
Author S. Chanda; Umapada Pal; Oriol Ramos Terrades
Title Word-Wise Thai and Roman Script Identification Type Journal
Year 2009 Publication ACM Transactions on Asian Language Information Processing Abbreviated Journal TALIP
Volume 8 Issue 3 Pages 1-21
Abstract In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme.
Corporate Author Thesis
Publisher Place of Publication Editor (up)
Language Summary Language Original Title
Series Editor Series Title Abbreviated Series Title
Series Volume Series Issue Edition
ISSN 1530-0226 ISBN Medium
Area Expedition Conference
Notes DAG Approved no
Call Number Admin @ si @ CPR2009f Serial 1869
Permanent link to this record