|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas, Josep Llados, R.Jain, & D.Doermann. (2015). Novel Line Verification for Multiple Instance Focused Retrieval in Document Collections. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 481–485).
|
|
|
J. Chazalon, Marçal Rusiñol, & Jean-Marc Ogier. (2015). Improving Document Matching Performance by Local Descriptor Filtering. In 6th IAPR International Workshop on Camera Based Document Analysis and Recognition CBDAR2015 (pp. 1216–1220).
Abstract: In this paper we propose an effective method aimed at reducing the amount of local descriptors to be indexed in a document matching framework. In an off-line training stage, the matching between the model document and incoming images is computed retaining the local descriptors from the model that steadily produce good matches. We have evaluated this approach by using the ICDAR2015 SmartDOC dataset containing near 25 000 images from documents to be captured by a mobile device. We have tested the performance of this filtering step by using
ORB and SIFT local detectors and descriptors. The results show an important gain both in quality of the final matching as well as in time and space requirements.
|
|
|
Jean-Christophe Burie, J. Chazalon, M. Coustaty, S. Eskenazi, Muhammad Muzzamil Luqman, M. Mehri, et al. (2015). ICDAR2015 Competition on Smartphone Document Capture and OCR (SmartDoc). In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 1161–1165).
Abstract: Smartphones are enabling new ways of capture,
hence arises the need for seamless and reliable acquisition and
digitization of documents, in order to convert them to editable,
searchable and a more human-readable format. Current stateof-the-art
works lack databases and baseline benchmarks for
digitizing mobile captured documents. We have organized a
competition for mobile document capture and OCR in order to
address this issue. The competition is structured into two independent
challenges: smartphone document capture, and smartphone
OCR. This report describes the datasets for both challenges
along with their ground truth, details the performance evaluation
protocols which we used, and presents the final results of the
participating methods. In total, we received 13 submissions: 8
for challenge-I, and 5 for challenge-2.
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2015). Object Proposals for Text Extraction in the Wild. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 206–210).
Abstract: Object Proposals is a recent computer vision technique receiving increasing interest from the research community. Its main objective is to generate a relatively small set of bounding box proposals that are most likely to contain objects of interest. The use of Object Proposals techniques in the scene text understanding field is innovative. Motivated by the success of powerful while expensive techniques to recognize words in a holistic way, Object Proposals techniques emerge as an alternative to the traditional text detectors. In this paper we study to what extent the existing generic Object Proposals methods may be useful for scene text understanding. Also, we propose a new Object Proposals algorithm that is specifically designed for text and compare it with other generic methods in the state of the art. Experiments show that our proposal is superior in its ability of producing good quality word proposals in an efficient way. The source code of our method is made publicly available
|
|
|
Dimosthenis Karatzas, Lluis Gomez, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, et al. (2015). ICDAR 2015 Competition on Robust Reading. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 1156–1160).
|
|
|
Pau Riba, Josep Llados, & Alicia Fornes. (2015). Handwritten Word Spotting by Inexact Matching of Grapheme Graphs. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 781–785).
Abstract: This paper presents a graph-based word spotting for handwritten documents. Contrary to most word spotting techniques, which use statistical representations, we propose a structural representation suitable to be robust to the inherent deformations of handwriting. Attributed graphs are constructed using a part-based approach. Graphemes extracted from shape convexities are used as stable units of handwriting, and are associated to graph nodes. Then, spatial relations between them determine graph edges. Spotting is defined in terms of an error-tolerant graph matching using bipartite-graph matching algorithm. To make the method usable in large datasets, a graph indexing approach that makes use of binary embeddings is used as preprocessing. Historical documents are used as experimental framework. The approach is comparable to statistical ones in terms of time and memory requirements, especially when dealing with large document collections.
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, Josep Llados, David Fernandez, & Cristina Cañero. (2015). Use case visual Bag-of-Words techniques for camera based identity document classification. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 721–725).
Abstract: Nowadays, automatic identity document recognition, including passport and driving license recognition, is at the core of many applications within the administrative and service sectors, such as police, hospitality, car renting, etc. In former years, the document information was manually extracted whereas today this data is recognized automatically from images obtained by flat-bed scanners. Yet, since these scanners tend to be expensive and voluminous, companies in the sector have recently turned their attention to cheaper, small and yet computationally powerful scanners: the mobile devices. The document identity recognition from mobile images enclose several new difficulties w.r.t traditional scanned images, such as the loss of a controlled background, perspective, blurring, etc. In this paper we present a real application for identity document classification of images taken from mobile devices. This classification process is of extreme importance since a prior knowledge of the document type and origin strongly facilitates the subsequent information extraction. The proposed method is based on a traditional Bagof-Words in which we have taken into consideration several key aspects to enhance recognition rate. The method performance has been studied on three datasets containing more than 2000 images from 129 different document classes.
|
|
|
Lluis Pere de las Heras, Oriol Ramos Terrades, & Josep Llados. (2015). Attributed Graph Grammar for floor plan analysis. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 726–730).
Abstract: In this paper, we propose the use of an Attributed Graph Grammar as unique framework to model and recognize the structure of floor plans. This grammar represents a building as a hierarchical composition of structurally and semantically related elements, where common representations are learned stochastically from annotated data. Given an input image, the parsing consists on constructing that graph representation that better agrees with the probabilistic model defined by the grammar. The proposed method provides several advantages with respect to the traditional floor plan analysis techniques. It uses an unsupervised statistical approach for detecting walls that adapts to different graphical notations and relaxes strong structural assumptions such are straightness and orthogonality. Moreover, the independence between the knowledge model and the parsing implementation allows the method to learn automatically different building configurations and thus, to cope the existing variability. These advantages are clearly demonstrated by comparing it with the most recent floor plan interpretation techniques on 4 datasets of real floor plans with different notations.
|
|
|
Hongxing Gao. (2015). Focused Structural Document Image Retrieval in Digital Mailroom Applications (Josep Llados, Dimosthenis Karatzas, & Marçal Rusiñol, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: In this work, we develop a generic framework that is able to handle the document retrieval problem in various scenarios such as searching for full page matches or retrieving the counterparts for specific document areas, focusing on their structural similarity or letting their visual resemblance to play a dominant role. Based on the spatial indexing technique, we propose to search for matches of local key-region pairs carrying both structural and visual information from the collection while a scheme allowing to adjust the relative contribution of structural and visual similarity is presented.
Based on the fact that the structure of documents is tightly linked with the distance among their elements, we firstly introduce an efficient detector named Distance Transform based Maximally Stable Extremal Regions (DTMSER). We illustrate that this detector is able to efficiently extract the structure of a document image as a dendrogram (hierarchical tree) of multi-scale key-regions that roughly correspond to letters, words and paragraphs. We demonstrate that, without benefiting from the structure information, the key-regions extracted by the DTMSER algorithm achieve better results comparing with state-of-the-art methods while much less amount of key-regions are employed.
We subsequently propose a pair-wise Bag of Words (BoW) framework to efficiently embed the explicit structure extracted by the DTMSER algorithm. We represent each document as a list of key-region pairs that correspond to the edges in the dendrogram where inclusion relationship is encoded. By employing those structural key-region pairs as the pooling elements for generating the histogram of features, the proposed method is able to encode the explicit inclusion relations into a BoW representation. The experimental results illustrate that the pair-wise BoW, powered by the embedded structural information, achieves remarkable improvement over the conventional BoW and spatial pyramidal BoW methods.
To handle various retrieval scenarios in one framework, we propose to directly query a series of key-region pairs, carrying both structure and visual information, from the collection. We introduce the spatial indexing techniques to the document retrieval community to speed up the structural relationship computation for key-region pairs. We firstly test the proposed framework in a full page retrieval scenario where structurally similar matches are expected. In this case, the pair-wise querying method achieves notable improvement over the BoW and spatial pyramidal BoW frameworks. Furthermore, we illustrate that the proposed method is also able to handle focused retrieval situations where the queries are defined as a specific interesting partial areas of the images. We examine our method on two types of focused queries: structure-focused and exact queries. The experimental results show that, the proposed generic framework obtains nearly perfect precision on both types of focused queries while it is the first framework able to tackle structure-focused queries, setting a new state of the art in the field.
Besides, we introduce a line verification method to check the spatial consistency among the matched key-region pairs. We propose a computationally efficient version of line verification through a two step implementation. We first compute tentative localizations of the query and subsequently employ them to divide the matched key-region pairs into several groups, then line verification is performed within each group while more precise bounding boxes are computed. We demonstrate that, comparing with the standard approach (based on RANSAC), the line verification proposed generally achieves much higher recall with slight loss on precision on specific queries.
|
|
|
Olivier Lefebvre, Pau Riba, Charles Fournier, Alicia Fornes, Josep Llados, Rejean Plamondon, et al. (2015). Monitoring neuromotricity on-line: a cloud computing approach. In 17th Conference of the International Graphonomics Society IGS2015.
Abstract: The goal of our experiment is to develop a useful and accessible tool that can be used to evaluate a patient's health by analyzing handwritten strokes. We use a cloud computing approach to analyze stroke data sampled on a commercial tablet working on the Android platform and a distant server to perform complex calculations using the Delta and Sigma lognormal algorithms. A Google Drive account is used to store the data and to ease the development of the project. The communication between the tablet, the cloud and the server is encrypted to ensure biomedical information confidentiality. Highly parameterized biomedical tests are implemented on the tablet as well as a free drawing test to evaluate the validity of the data acquired by the first test compared to the second one. A blurred shape model descriptor pattern recognition algorithm is used to classify the data obtained by the free drawing test. The functions presented in this paper are still currently under development and other improvements are needed before launching the application in the public domain.
|
|
|
Youssef El Rhabi, Simon Loic, & Brun Luc. (2015). Estimation de la pose d’une caméra à partir d’un flux vidéo en s’approchant du temps réel. In 15ème édition d'ORASIS, journées francophones des jeunes chercheurs en vision par ordinateur ORASIS2015.
Abstract: Finding a way to estimate quickly and robustly the pose of an image is essential in augmented reality. Here we will discuss the approach we chose in order to get closer to real time by using SIFT points [4]. We propose a method based on filtering both SIFT points and images on which to focus on. Hence we will focus on relevant data.
Keywords: Augmented Reality; SFM; SLAM; real time pose computation; 2D/3D registration
|
|
|
Anguelos Nicolaou, Andrew Bagdanov, Marcus Liwicki, & Dimosthenis Karatzas. (2015). Sparse Radial Sampling LBP for Writer Identification. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 716–720).
Abstract: In this paper we present the use of Sparse Radial Sampling Local Binary Patterns, a variant of Local Binary Patterns (LBP) for text-as-texture classification. By adapting and extending the standard LBP operator to the particularities of text we get a generic text-as-texture classification scheme and apply it to writer identification. In experiments on CVL and ICDAR 2013 datasets, the proposed feature-set demonstrates State-Of-the-Art (SOA) performance. Among the SOA, the proposed method is the only one that is based on dense extraction of a single local feature descriptor. This makes it fast and applicable at the earliest stages in a DIA pipeline without the need for segmentation, binarization, or extraction of multiple features.
|
|
|
Suman Ghosh, Lluis Gomez, Dimosthenis Karatzas, & Ernest Valveny. (2015). Efficient indexing for Query By String text retrieval. In 6th IAPR International Workshop on Camera Based Document Analysis and Recognition CBDAR2015 (pp. 1236–1240).
Abstract: This paper deals with Query By String word spotting in scene images. A hierarchical text segmentation algorithm based on text specific selective search is used to find text regions. These regions are indexed per character n-grams present in the text region. An attribute representation based on Pyramidal Histogram of Characters (PHOC) is used to compare text regions with the query text. For generation of the index a similar attribute space based Pyramidal Histogram of character n-grams is used. These attribute models are learned using linear SVMs over the Fisher Vector [1] representation of the images along with the PHOC labels of the corresponding strings.
|
|
|
J.Kuhn, A.Nussbaumer, J.Pirker, Dimosthenis Karatzas, A. Pagani, O.Conlan, et al. (2015). Advancing Physics Learning Through Traversing a Multi-Modal Experimentation Space. In Workshop Proceedings on the 11th International Conference on Intelligent Environments (Vol. 19, pp. 373–380).
Abstract: Translating conceptual knowledge into real world experiences presents a significant educational challenge. This position paper presents an approach that supports learners in moving seamlessly between conceptual learning and their application in the real world by bringing physical and virtual experiments into everyday settings. Learners are empowered in conducting these situated experiments in a variety of physical settings by leveraging state of the art mobile, augmented reality, and virtual reality technology. A blend of mobile-based multi-sensory physical experiments, augmented reality and enabling virtual environments can allow learners to bridge their conceptual learning with tangible experiences in a completely novel manner. This approach focuses on the learner by applying self-regulated personalised learning techniques, underpinned by innovative pedagogical approaches and adaptation techniques, to ensure that the needs and preferences of each learner are catered for individually.
|
|
|
Suman Ghosh, & Ernest Valveny. (2015). Query by String word spotting based on character bi-gram indexing. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 881–885).
Abstract: In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representa- tion that projects images and strings into a common atribute space based on a pyramidal histogram of characters(PHOC). These attribute models are learned using linear SVMs over the Fisher Vector representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi- gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasets
|
|