|
Juan Ignacio Toledo, Sebastian Sudholt, Alicia Fornes, Jordi Cucurull, A. Fink and Josep Llados. 2016. Handwritten Word Image Categorization with Convolutional Neural Networks and Spatial Pyramid Pooling. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer International Publishing, 543–552. (LNCS.)
Abstract: The extraction of relevant information from historical document collections is one of the key steps in order to make these documents available for access and searches. The usual approach combines transcription and grammars in order to extract semantically meaningful entities. In this paper, we describe a new method to obtain word categories directly from non-preprocessed handwritten word images. The method can be used to directly extract information, being an alternative to the transcription. Thus it can be used as a first step in any kind of syntactical analysis. The approach is based on Convolutional Neural Networks with a Spatial Pyramid Pooling layer to deal with the different shapes of the input images. We performed the experiments on a historical marriage record dataset, obtaining promising results.
Keywords: Document image analysis; Word image categorization; Convolutional neural networks; Named entity detection
|
|
|
Albert Berenguel, Oriol Ramos Terrades, Josep Llados and Cristina Cañero. 2016. Banknote counterfeit detection through background texture printing analysis. 12th IAPR Workshop on Document Analysis Systems.
Abstract: This paper is focused on the detection of counterfeit photocopy banknotes. The main difficulty is to work on a real industrial scenario without any constraint about the acquisition device and with a single image. The main contributions of this paper are twofold: first the adaptation and performance evaluation of existing approaches to classify the genuine and photocopy banknotes using background texture printing analysis, which have not been applied into this context before. Second, a new dataset of Euro banknotes images acquired with several cameras under different luminance conditions to evaluate these methods. Experiments on the proposed algorithms show that mixing SIFT features and sparse coding dictionaries achieves quasi perfect classification using a linear SVM with the created dataset. Approaches using dictionaries to cover all possible texture variations have demonstrated to be robust and outperform the state-of-the-art methods using the proposed benchmark.
|
|
|
Lluis Gomez, Y. Patel, Marçal Rusiñol, C.V. Jawahar and Dimosthenis Karatzas. 2017. Self‐supervised learning of visual features through embedding images into text topic spaces. 30th IEEE Conference on Computer Vision and Pattern Recognition.
Abstract: End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches.
|
|
|
Arnau Baro, Pau Riba and Alicia Fornes. 2016. Towards the recognition of compound music notes in handwritten music scores. 15th international conference on Frontiers in Handwriting Recognition.
Abstract: The recognition of handwritten music scores still remains an open problem. The existing approaches can only deal with very simple handwritten scores mainly because of the variability in the handwriting style and the variability in the composition of groups of music notes (i.e. compound music notes). In this work we focus on this second problem and propose a method based on perceptual grouping for the recognition of compound music notes. Our method has been tested using several handwritten music scores of the CVC-MUSCIMA database and compared with a commercial Optical Music Recognition (OMR) software. Given that our method is learning-free, the obtained results are promising.
|
|
|
Oriol Vicente, Alicia Fornes and Ramon Valdes. 2016. The Digital Humanities Network of the UABCie: a smart structure of research and social transference for the digital humanities. Digital Humanities Centres: Experiences and Perspectives.
|
|
|
Veronica Romero, Alicia Fornes, Enrique Vidal and Joan Andreu Sanchez. 2016. Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books. 15th international conference on Frontiers in Handwriting Recognition.
Abstract: Handwritten marriage licenses books have been used for centuries by ecclesiastical and secular institutions to register marriages. The information contained in these historical documents is useful for demography studies and
genealogical research, among others. Despite the generally simple structure of the text in these documents, automatic transcription and semantic information extraction is difficult due to the distinct and evolutionary vocabulary, which is composed mainly of proper names that change along the time. In previous
works we studied the use of category-based language models to both improve the automatic transcription accuracy and make easier the extraction of semantic information. Here we analyze the main causes of the semantic errors observed in previous results and apply a Grammatical Inference technique known as MGGI to improve the semantic accuracy of the language model obtained. Using this language model, full handwritten text recognition experiments have been carried out, with results supporting the interest of the proposed approach.
|
|
|
Hana Jarraya, Oriol Ramos Terrades and Josep Llados. 2017. Graph Embedding through Probabilistic Graphical Model applied to Symbolic Graphs. 8th Iberian Conference on Pattern Recognition and Image Analysis.
Abstract: We propose a new Graph Embedding (GEM) method that takes advantages of structural pattern representation. It models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector. This vector is a signature of AG in a lower dimensional vectorial space. We apply Structured Support Vector Machines (SSVM) to process classification task. As first tentative, results on the GREC dataset are encouraging enough to go further on this direction.
Keywords: Attributed Graph; Probabilistic Graphical Model; Graph Embedding; Structured Support Vector Machines
|
|
|
Lasse Martensson, Anders Hast and Alicia Fornes. 2017. Word Spotting as a Tool for Scribal Attribution. 2nd Conference of the association of Digital Humanities in the Nordic Countries.87–89.
|
|
|
Hana Jarraya, Oriol Ramos Terrades and Josep Llados. 2017. Learning structural loss parameters on graph embedding applied on symbolic graphs. 12th IAPR International Workshop on Graphics Recognition.
Abstract: We propose an amelioration of proposed Graph Embedding (GEM) method in previous work that takes advantages of structural pattern representation and the structured distortion. it models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector, as new signature of AG in a lower dimensional vectorial space. We focus to adapt the structured learning algorithm via 1_slack formulation with a suitable risk function, called Graph Edit Distance (GED). It defines the dissimilarity of the ground truth and predicted graph labels. It determines by the error tolerant graph matching using bipartite graph matching algorithm. We apply Structured Support Vector Machines (SSVM) to process classification task. During our experiments, we got our results on the GREC dataset.
|
|
|
J. Chazalon and 9 others. 2017. SmartDoc 2017 Video Capture: Mobile Document Acquisition in Video Mode. 1st International Workshop on Open Services and Tools for Document Analysis.
Abstract: As mobile document acquisition using smartphones is getting more and more common, along with the continuous improvement of mobile devices (both in terms of computing power and image quality), we can wonder to which extent mobile phones can replace desktop scanners. Modern applications can cope with perspective distortion and normalize the contrast of a document page captured with a smartphone, and in some cases like bottle labels or posters, smartphones even have the advantage of allowing the acquisition of non-flat or large documents. However, several cases remain hard to handle, such as reflective documents (identity cards, badges, glossy magazine cover, etc.) or large documents for which some regions require an important amount of detail. This paper introduces the SmartDoc 2017 benchmark (named “SmartDoc Video Capture”), which aims at
assessing whether capturing documents using the video mode of a smartphone could solve those issues. The task under evaluation is both a stitching and a reconstruction problem, as the user can move the device over different parts of the document to capture details or try to erase highlights. The material released consists of a dataset, an evaluation method and the associated tool, a sample method, and the tools required to extend the dataset. All the components are released publicly under very permissive licenses, and we particularly cared about maximizing the ease of
understanding, usage and improvement.
|
|