|
Francisco Alvaro, Francisco Cruz, Joan Andreu Sanchez, Oriol Ramos Terrades and Jose Miguel Benedi. 2015. Structure Detection and Segmentation of Documents Using 2D Stochastic Context-Free Grammars. NEUCOM, 150(A), 147–154.
Abstract: In this paper we dene a bidimensional extension of Stochastic Context-Free Grammars for structure detection and segmentation of images of documents.
Two sets of text classication features are used to perform an initial classication of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of historical marriage license books to validate this approach. We also tested several inference algorithms for Probabilistic Graphical Models
and the results showed that the proposed grammatical model outperformed
the other methods. Furthermore, grammars also provide the document structure
along with its segmentation.
Keywords: document image analysis; stochastic context-free grammars; text classication features
|
|
|
Lluis Gomez and 6 others. 2021. Multimodal grid features and cell pointers for scene text visual question answering. PRL, 150, 242–249.
Abstract: This paper presents a new model for the task of scene text visual question answering. In this task questions about a given image can only be answered by reading and understanding scene text. Current state of the art models for this task make use of a dual attention mechanism in which one attention module attends to visual features while the other attends to textual features. A possible issue with this is that it makes difficult for the model to reason jointly about both modalities. To fix this problem we propose a new model that is based on an single attention mechanism that attends to multi-modal features conditioned to the question. The output weights of this attention module over a grid of multi-modal spatial features are interpreted as the probability that a certain spatial location of the image contains the answer text to the given question. Our experiments demonstrate competitive performance in two standard datasets with a model that is faster than previous methods at inference time. Furthermore, we also provide a novel analysis of the ST-VQA dataset based on a human performance study. Supplementary material, code, and data is made available through this link.
|
|
|
Anjan Dutta. 2010. Symbol Spotting in Graphical Documents by Serialized Subgraph Matching. (Master's thesis, .)
|
|
|
Mohamed Ali Souibgui, Alicia Fornes, Yousri Kessentini and Beata Megyesi. 2022. Few shots are all you need: A progressive learning approach for low resource handwritten text recognition. PRL, 160, 43–49.
Abstract: Handwritten text recognition in low resource scenarios, such as manuscripts with rare alphabets, is a challenging problem. In this paper, we propose a few-shot learning-based handwriting recognition approach that significantly reduces the human annotation process, by requiring only a few images of each alphabet symbols. The method consists of detecting all the symbols of a given alphabet in a textline image and decoding the obtained similarity scores to the final sequence of transcribed symbols. Our model is first pretrained on synthetic line images generated from an alphabet, which could differ from the alphabet of the target domain. A second training step is then applied to reduce the gap between the source and the target data. Since this retraining would require annotation of thousands of handwritten symbols together with their bounding boxes, we propose to avoid such human effort through an unsupervised progressive learning approach that automatically assigns pseudo-labels to the unlabeled data. The evaluation on different datasets shows that our model can lead to competitive results with a significant reduction in human effort. The code will be publicly available in the following repository: https://github.com/dali92002/HTRbyMatching
|
|
|
David Fernandez. 2010. Handwritten Word Spotting in Old Manuscript Images using Shape Descriptors. (Master's thesis, .)
|
|
|
Lluis Gomez. 2012. Perceptual Organization for Text Extraction in Natural Scenes. (Master's thesis, .)
|
|
|
Nuria Cirera. 2012. Recognition of Handwritten Historical Documents. (Master's thesis, .)
|
|
|
Josep Llados, J. Lopez-Krahe and D. Archambault. 2007. Special Issue on Information Technologies for Visually Impaired People. Guest Editors.
|
|
|
Sounak Dey and 6 others. 2017. Script independent approach for multi-oriented text detection in scene image. NEUCOM, 242, 96–112.
Abstract: Developing a text detection method which is invariant to scripts in natural scene images is a challeng- ing task due to different geometrical structures of various scripts. Besides, multi-oriented of text lines in natural scene images make the problem more challenging. This paper proposes to explore ring radius transform (RRT) for text detection in multi-oriented and multi-script environments. The method finds component regions based on convex hull to generate radius matrices using RRT. It is a fact that RRT pro- vides low radius values for the pixels that are near to edges, constant radius values for the pixels that represent stroke width, and high radius values that represent holes created in background and convex hull because of the regular structures of text components. We apply k -means clustering on the radius matrices to group such spatially coherent regions into individual clusters. Then the proposed method studies the radius values of such cluster components that are close to the centroid and far from the cen- troid to detect text components. Furthermore, we have developed a Bangla dataset (named as ISI-UM dataset) and propose a semi-automatic system for generating its ground truth for text detection of arbi- trary orientations, which can be used by the researchers for text detection and recognition in the future. The ground truth will be released to public. Experimental results on our ISI-UM data and other standard datasets, namely, ICDAR 2013 scene, SVT and MSRA data, show that the proposed method outperforms the existing methods in terms of multi-lingual and multi-oriented text detection ability.
|
|
|
Thanh Ha Do, Salvatore Tabbone and Oriol Ramos Terrades. 2016. Spotting Symbol over Graphical Documents Via Sparsity in Visual Vocabulary. Recent Trends in Image Processing and Pattern Recognition.
|
|