2017 |
|
Suman Ghosh and Ernest Valveny. 2017. R-PHOC: Segmentation-Free Word Spotting using CNN. 14th International Conference on Document Analysis and Recognition.
Abstract: arXiv:1707.01294
This paper proposes a region based convolutional neural network for segmentation-free word spotting. Our network takes as input an image and a set of word candidate bound- ing boxes and embeds all bounding boxes into an embedding space, where word spotting can be casted as a simple nearest neighbour search between the query representation and each of the candidate bounding boxes. We make use of PHOC embedding as it has previously achieved significant success in segmentation- based word spotting. Word candidates are generated using a simple procedure based on grouping connected components using some spatial constraints. Experiments show that R-PHOC which operates on images directly can improve the current state-of- the-art in the standard GW dataset and performs as good as PHOCNET in some cases designed for segmentation based word spotting.
Keywords: Convolutional neural network; Image segmentation; Artificial neural network; Nearest neighbor search
|
|
|
Suman Ghosh and Ernest Valveny. 2017. Visual attention models for scene text recognition. 14th International Conference on Document Analysis and Recognition.
Abstract: arXiv:1706.01487
In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR'03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition.
|
|
|
Veronica Romero, Alicia Fornes, Enrique Vidal and Joan Andreu Sanchez. 2017. Information Extraction in Handwritten Marriage Licenses Books Using the MGGI Methodology. In L.A. Alexandre, J.Salvador Sanchez and Joao M. F. Rodriguez, eds. 8th Iberian Conference on Pattern Recognition and Image Analysis.287–294. (LNCS.)
Abstract: Historical records of daily activities provide intriguing insights into the life of our ancestors, useful for demographic and genealogical research. For example, marriage license books have been used for centuries by ecclesiastical and secular institutions to register marriages. These books follow a simple structure of the text in the records with a evolutionary vocabulary, mainly composed of proper names that change along the time. This distinct vocabulary makes automatic transcription and semantic information extraction difficult tasks. In previous works we studied the use of category-based language models and how a Grammatical Inference technique known as MGGI could improve the accuracy of these tasks. In this work we analyze the main causes of the semantic errors observed in previous results and apply a better implementation of the MGGI technique to solve these problems. Using the resulting language model, transcription and information extraction experiments have been carried out, and the results support our proposed approach.
Keywords: Handwritten Text Recognition; Information extraction; Language modeling; MGGI; Categories-based language model
|
|
2016 |
|
Albert Berenguel, Oriol Ramos Terrades, Josep Llados and Cristina Cañero. 2016. Banknote counterfeit detection through background texture printing analysis. 12th IAPR Workshop on Document Analysis Systems.
Abstract: This paper is focused on the detection of counterfeit photocopy banknotes. The main difficulty is to work on a real industrial scenario without any constraint about the acquisition device and with a single image. The main contributions of this paper are twofold: first the adaptation and performance evaluation of existing approaches to classify the genuine and photocopy banknotes using background texture printing analysis, which have not been applied into this context before. Second, a new dataset of Euro banknotes images acquired with several cameras under different luminance conditions to evaluate these methods. Experiments on the proposed algorithms show that mixing SIFT features and sparse coding dictionaries achieves quasi perfect classification using a linear SVM with the created dataset. Approaches using dictionaries to cover all possible texture variations have demonstrated to be robust and outperform the state-of-the-art methods using the proposed benchmark.
|
|
|
Alicia Fornes, Josep Llados, Oriol Ramos Terrades and Marçal Rusiñol. 2016. La Visió per Computador com a Eina per a la Interpretació Automàtica de Fonts Documentals.
|
|
|
Anders Hast and Alicia Fornes. 2016. A Segmentation-free Handwritten Word Spotting Approach by Relaxed Feature Matching. 12th IAPR Workshop on Document Analysis Systems.150–155.
Abstract: The automatic recognition of historical handwritten documents is still considered challenging task. For this reason, word spotting emerges as a good alternative for making the information contained in these documents available to the user. Word spotting is defined as the task of retrieving all instances of the query word in a document collection, becoming a useful tool for information retrieval. In this paper we propose a segmentation-free word spotting approach able to deal with large document collections. Our method is inspired on feature matching algorithms that have been applied to image matching and retrieval. Since handwritten words have different shape, there is no exact transformation to be obtained. However, the sufficient degree of relaxation is achieved by using a Fourier based descriptor and an alternative approach to RANSAC called PUMA. The proposed approach is evaluated on historical marriage records, achieving promising results.
|
|
|
Anjan Dutta, Umapada Pal and Josep Llados. 2016. Compact Correlated Features for Writer Independent Signature Verification. 23rd International Conference on Pattern Recognition.
Abstract: This paper considers the offline signature verification problem which is considered to be an important research line in the field of pattern recognition. In this work we propose hybrid features that consider the local features and their global statistics in the signature image. This has been done by creating a vocabulary of histogram of oriented gradients (HOGs). We impose weights on these local features based on the height information of water reservoirs obtained from the signature. Spatial information between local features are thought to play a vital role in considering the geometry of the signatures which distinguishes the originals from the forged ones. Nevertheless, learning a condensed set of higher order neighbouring features based on visual words, e.g., doublets and triplets, continues to be a challenging problem as possible combinations of visual words grow exponentially. To avoid this explosion of size, we create a code of local pairwise features which are represented as joint descriptors. Local features are paired based on the edges of a graph representation built upon the Delaunay triangulation. We reveal the advantage of combining both type of visual codebooks (order one and pairwise) for signature verification task. This is validated through an encouraging result on two benchmark datasets viz. CEDAR and GPDS300.
|
|
|
Arnau Baro, Pau Riba and Alicia Fornes. 2016. Towards the recognition of compound music notes in handwritten music scores. 15th international conference on Frontiers in Handwriting Recognition.
Abstract: The recognition of handwritten music scores still remains an open problem. The existing approaches can only deal with very simple handwritten scores mainly because of the variability in the handwriting style and the variability in the composition of groups of music notes (i.e. compound music notes). In this work we focus on this second problem and propose a method based on perceptual grouping for the recognition of compound music notes. Our method has been tested using several handwritten music scores of the CVC-MUSCIMA database and compared with a commercial Optical Music Recognition (OMR) software. Given that our method is learning-free, the obtained results are promising.
|
|
|
Carlos David Martinez Hinarejos and 10 others. 2016. Context, multimodality, and user collaboration in handwritten text processing: the CoMUN-HaT project. 3rd IberSPEECH.
Abstract: Processing of handwritten documents is a task that is of wide interest for many
purposes, such as those related to preserve cultural heritage. Handwritten text recognition techniques have been successfully applied during the last decade to obtain transcriptions of handwritten documents, and keyword spotting techniques have been applied for searching specific terms in image collections of handwritten documents. However, results on transcription and indexing are far from perfect. In this framework, the use of new data sources arises as a new paradigm that will allow for a better transcription and indexing of handwritten documents. Three main different data sources could be considered: context of the document (style, writer, historical time, topics,. . . ), multimodal data (representations of the document in a different modality, such as the speech signal of the dictation of the text), and user feedback (corrections, amendments,. . . ). The CoMUN-HaT project aims at the integration of these different data sources into the transcription and indexing task for handwritten documents: the use of context derived from the analysis of the documents, how multimodality can aid the recognition process to obtain more accurate transcriptions (including transcription in a modern version of the language), and integration into a userin-the-loop assisted text transcription framework. This will be reflected in the construction of a transcription and indexing platform that can be used by both professional and nonprofessional users, contributing to crowd-sourcing activities to preserve cultural heritage and to obtain an accessible version of the involved corpus.
|
|
|
Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas and Andrew Bagdanov. 2016. Improving Text Proposals for Scene Images with Fully Convolutional Networks. 23rd International Conference on Pattern Recognition Workshops.
Abstract: Text Proposals have emerged as a class-dependent version of object proposals – efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text
recognition. In this paper we propose an improvement over the original Text Proposals algorithm of [1], combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art.
|
|