|
Weijia Wu and 7 others. 2023. ICDAR 2023 Competition on Video Text Reading for Dense and Small Text. 17th International Conference on Document Analysis and Recognition.405–419. (LNCS.)
Abstract: Recently, video text detection, tracking and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenario, while ignore extreme video texts challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a video text reading benchmark, named DSText, which focuses on dense and small text reading challenge in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., ‘Game’, ‘Sports’, etc. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task2)). During the competition period (opened on 15th February, 2023 and closed on 20th March, 2023), a total of 24 teams participated in the three proposed tasks with around 30 valid submissions, respectively. In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols and the results summaries of the ICDAR 2023 on DSText competition. Moreover, we hope the benchmark will promise the video text research in the community.
Keywords: Video Text Spotting; Small Text; Text Tracking; Dense Text
|
|
|
W. Liu and Josep Llados. 2006. Graphics Recognition. Ten Years Review and Future Perspectives. (LNCS.)
|
|
|
Volkmar Frinken, Markus Baumgartner, Andreas Fischer and Horst Bunke. 2012. Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting. 13th International Conference on Frontiers in Handwriting Recognition.49–54.
Abstract: State-of-the-art handwriting recognition systems are learning-based systems that require large sets of training data. The creation of training data, and consequently the creation of a well-performing recognition system, requires therefore a substantial amount of human work. This can be reduced with semi-supervised learning, which uses unlabeled text lines for training as well. Current approaches estimate the correct transcription of the unlabeled data via handwriting recognition which is not only extremely demanding as far as computational costs are concerned but also requires a good model of the target language. In this paper, we propose a different approach that makes use of keyword spotting, which is significantly faster and does not need any language model. In a set of experiments we demonstrate its superiority over existing approaches.
|
|
|
Volkmar Frinken, Francisco Zamora, Salvador España, Maria Jose Castro, Andreas Fischer and Horst Bunke. 2012. Long-Short Term Memory Neural Networks Language Modeling for Handwriting Recognition. 21st International Conference on Pattern Recognition.701–704.
Abstract: Unconstrained handwritten text recognition systems maximize the combination of two separate probability scores. The first one is the observation probability that indicates how well the returned word sequence matches the input image. The second score is the probability that reflects how likely a word sequence is according to a language model. Current state-of-the-art recognition systems use statistical language models in form of bigram word probabilities. This paper proposes to model the target language by means of a recurrent neural network with long-short term memory cells. Because the network is recurrent, the considered context is not limited to a fixed size especially as the memory cells are designed to deal with long-term dependencies. In a set of experiments conducted on the IAM off-line database we show the superiority of the proposed language model over statistical n-gram models.
|
|
|
Volkmar Frinken, Andreas Fischer, Markus Baumgartner and Horst Bunke. 2014. Keyword spotting for self-training of BLSTM NN based handwriting recognition systems. PR, 47(3), 1073–1082.
Abstract: The automatic transcription of unconstrained continuous handwritten text requires well trained recognition systems. The semi-supervised paradigm introduces the concept of not only using labeled data but also unlabeled data in the learning process. Unlabeled data can be gathered at little or not cost. Hence it has the potential to reduce the need for labeling training data, a tedious and costly process. Given a weak initial recognizer trained on labeled data, self-training can be used to recognize unlabeled data and add words that were recognized with high confidence to the training set for re-training. This process is not trivial and requires great care as far as selecting the elements that are to be added to the training set is concerned. In this paper, we propose to use a bidirectional long short-term memory neural network handwritten recognition system for keyword spotting in order to select new elements. A set of experiments shows the high potential of self-training for bootstrapping handwriting recognition systems, both for modern and historical handwritings, and demonstrate the benefits of using keyword spotting over previously published self-training schemes.
Keywords: Document retrieval; Keyword spotting; Handwriting recognition; Neural networks; Semi-supervised learning
|
|
|
Volkmar Frinken, Andreas Fischer, Horst Bunke and Alicia Fornes. 2011. Co-training for Handwritten Word Recognition. 11th International Conference on Document Analysis and Recognition.314–318.
Abstract: To cope with the tremendous variations of writing styles encountered between different individuals, unconstrained automatic handwriting recognition systems need to be trained on large sets of labeled data. Traditionally, the training data has to be labeled manually, which is a laborious and costly process. Semi-supervised learning techniques offer methods to utilize unlabeled data, which can be obtained cheaply in large amounts in order, to reduce the need for labeled data. In this paper, we propose the use of Co-Training for improving the recognition accuracy of two weakly trained handwriting recognition systems. The first one is based on Recurrent Neural Networks while the second one is based on Hidden Markov Models. On the IAM off-line handwriting database we demonstrate a significant increase of the recognition accuracy can be achieved with Co-Training for single word recognition.
|
|
|
Volkmar Frinken, Andreas Fischer and Carlos David Martinez Hinarejos. 2013. Handwriting Recognition in Historical Documents using Very Large Vocabularies. 2nd International Workshop on Historical Document Imaging and Processing.67–72.
Abstract: Language models are used in automatic transcription system to resolve ambiguities. This is done by limiting the vocabulary of words that can be recognized as well as estimating the n-gram probability of the words in the given text. In the context of historical documents, a non-unified spelling and the limited amount of written text pose a substantial problem for the selection of the recognizable vocabulary as well as the computation of the word probabilities. In this paper we propose for the transcription of historical Spanish text to keep the corpus for the n-gram limited to a sample of the target text, but expand the vocabulary with words gathered from external resources. We analyze the performance of such a transcription system with different sizes of external vocabularies and demonstrate the applicability and the significant increase in recognition accuracy of using up to 300 thousand external words.
|
|
|
Volkmar Frinken, Alicia Fornes, Josep Llados and Jean-Marc Ogier. 2012. Bidirectional Language Model for Handwriting Recognition. Structural, Syntactic, and Statistical Pattern Recognition, Joint IAPR International Workshop. Springer Berlin Heidelberg, 611–619. (LNCS.)
Abstract: In order to improve the results of automatically recognized handwritten text, information about the language is commonly included in the recognition process. A common approach is to represent a text line as a sequence. It is processed in one direction and the language information via n-grams is directly included in the decoding. This approach, however, only uses context on one side to estimate a word’s probability. Therefore, we propose a bidirectional recognition in this paper, using distinct forward and a backward language models. By combining decoding hypotheses from both directions, we achieve a significant increase in recognition accuracy for the off-line writer independent handwriting recognition task. Both language models are of the same type and can be estimated on the same corpus. Hence, the increase in recognition accuracy comes without any additional need for training data or language modeling complexity.
|
|
|
Veronica Romero, Emilio Granell, Alicia Fornes, Enrique Vidal and Joan Andreu Sanchez. 2019. Information Extraction in Handwritten Marriage Licenses Books. 5th International Workshop on Historical Document Imaging and Processing.66–71.
Abstract: Handwritten marriage licenses books are characterized by a simple structure of the text in the records with an evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. Previous works have shown that the use of category-based language models and a Grammatical Inference technique known as MGGI can improve the accuracy of these
tasks. However, the application of the MGGI algorithm requires an a priori knowledge to label the words of the training strings, that is not always easy to obtain. In this paper we study how to automatically obtain the information required by the MGGI algorithm using a technique based on Confusion Networks. Using the resulting language model, full handwritten text recognition and information extraction experiments have been carried out with results supporting the proposed approach.
|
|
|
Veronica Romero and 7 others. 2013. The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition. PR, 46(6), 1658–1669.
Abstract: Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demography studies and genealogical research. Automatic processing of historical documents, however, has mostly been focused on single works of literature and less on social records, which tend to have a distinct layout, structure, and vocabulary. Such information is usually collected by expert demographers that devote a lot of time to manually transcribe them. This paper presents a new database, compiled from a marriage license books collection, to support research in automatic handwriting recognition for historical documents containing social records. Marriage license books are documents that were used for centuries by ecclesiastical institutions to register marriage licenses. Books from this collection are handwritten and span nearly half a millennium until the beginning of the 20th century. In addition, a study is presented about the capability of state-of-the-art handwritten text recognition systems, when applied to the presented database. Baseline results are reported for reference in future studies.
|
|