|
J. Chazalon and 9 others. 2017. SmartDoc 2017 Video Capture: Mobile Document Acquisition in Video Mode. 1st International Workshop on Open Services and Tools for Document Analysis.
Abstract: As mobile document acquisition using smartphones is getting more and more common, along with the continuous improvement of mobile devices (both in terms of computing power and image quality), we can wonder to which extent mobile phones can replace desktop scanners. Modern applications can cope with perspective distortion and normalize the contrast of a document page captured with a smartphone, and in some cases like bottle labels or posters, smartphones even have the advantage of allowing the acquisition of non-flat or large documents. However, several cases remain hard to handle, such as reflective documents (identity cards, badges, glossy magazine cover, etc.) or large documents for which some regions require an important amount of detail. This paper introduces the SmartDoc 2017 benchmark (named “SmartDoc Video Capture”), which aims at
assessing whether capturing documents using the video mode of a smartphone could solve those issues. The task under evaluation is both a stitching and a reconstruction problem, as the user can move the device over different parts of the document to capture details or try to erase highlights. The material released consists of a dataset, an evaluation method and the associated tool, a sample method, and the tools required to extend the dataset. All the components are released publicly under very permissive licenses, and we particularly cared about maximizing the ease of
understanding, usage and improvement.
|
|
|
J. Chazalon, Marçal Rusiñol, Jean-Marc Ogier and Josep Llados. 2015. A Semi-Automatic Groundtruthing Tool for Mobile-Captured Document Segmentation. 13th International Conference on Document Analysis and Recognition ICDAR2015.621–625.
Abstract: This paper presents a novel way to generate groundtruth data for the evaluation of mobile document capture systems, focusing on the first stage of the image processing pipeline involved: document object detection and segmentation in lowquality preview frames. We introduce and describe a simple, robust and fast technique based on color markers which enables a semi-automated annotation of page corners. We also detail a technique for marker removal. Methods and tools presented in the paper were successfully used to annotate, in few hours, 24889
frames in 150 video files for the smartDOC competition at ICDAR 2015
|
|
|
J. Chazalon, Marçal Rusiñol and Jean-Marc Ogier. 2015. Improving Document Matching Performance by Local Descriptor Filtering. 6th IAPR International Workshop on Camera Based Document Analysis and Recognition CBDAR2015.1216–1220.
Abstract: In this paper we propose an effective method aimed at reducing the amount of local descriptors to be indexed in a document matching framework. In an off-line training stage, the matching between the model document and incoming images is computed retaining the local descriptors from the model that steadily produce good matches. We have evaluated this approach by using the ICDAR2015 SmartDOC dataset containing near 25 000 images from documents to be captured by a mobile device. We have tested the performance of this filtering step by using
ORB and SIFT local detectors and descriptors. The results show an important gain both in quality of the final matching as well as in time and space requirements.
|
|
|
Ilke Demir, Dena Bazazian, Adriana Romero, Viktoriia Sharmanska and Lyne P. Tchapmi. 2018. WiCV 2018: The Fourth Women In Computer Vision Workshop. 4th Women in Computer Vision Workshop.1941–19412.
Abstract: We present WiCV 2018 – Women in Computer Vision Workshop to increase the visibility and inclusion of women researchers in computer vision field, organized in conjunction with CVPR 2018. Computer vision and machine learning have made incredible progress over the past years, yet the number of female researchers is still low both in academia and industry. WiCV is organized to raise visibility of female researchers, to increase the collaboration,
and to provide mentorship and give opportunities to femaleidentifying junior researchers in the field. In its fourth year, we are proud to present the changes and improvements over the past years, summary of statistics for presenters and attendees, followed by expectations from future generations.
Keywords: Conferences; Computer vision; Industries; Object recognition; Engineering profession; Collaboration; Machine learning
|
|
|
Hongxing Gao and 6 others. 2013. Key-region detection for document images -applications to administrative document retrieval. 12th International Conference on Document Analysis and Recognition.230–234.
Abstract: In this paper we argue that a key-region detector designed to take into account the special characteristics of document images can result in the detection of less and more meaningful key-regions. We propose a fast key-region detector able to capture aspects of the structural information of the document, and demonstrate its efficiency by comparing against standard detectors in an administrative document retrieval scenario. We show that using the proposed detector results to a smaller number of detected key-regions and higher performance without any drop in speed compared to standard state of the art detectors.
|
|
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas, Josep Llados, R.Jain and D.Doermann. 2015. Novel Line Verification for Multiple Instance Focused Retrieval in Document Collections. 13th International Conference on Document Analysis and Recognition ICDAR2015.481–485.
|
|
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas and Josep Llados. 2014. Fast Structural Matching for Document Image Retrieval through Spatial Databases. Document Recognition and Retrieval XXI.
Abstract: The structure of document images plays a signicant role in document analysis thus considerable eorts have been made towards extracting and understanding document structure, usually in the form of layout analysis approaches. In this paper, we rst employ Distance Transform based MSER (DTMSER) to eciently extract stable document structural elements in terms of a dendrogram of key-regions. Then a fast structural matching method is proposed to query the structure of document (dendrogram) based on a spatial database which facilitates the formulation of advanced spatial queries. The experiments demonstrate a signicant improvement in a document retrieval scenario when compared to the use of typical Bag of Words (BoW) and pyramidal BoW descriptors.
Keywords: Document image retrieval; distance transform; MSER; spatial database
|
|
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas and Josep Llados. 2014. Embedding Document Structure to Bag-of-Words through Pair-wise Stable Key-regions. 22nd International Conference on Pattern Recognition.2903–2908.
Abstract: Since the document structure carries valuable discriminative information, plenty of efforts have been made for extracting and understanding document structure among which layout analysis approaches are the most commonly used. In this paper, Distance Transform based MSER (DTMSER) is employed to efficiently extract the document structure as a dendrogram of key-regions which roughly correspond to structural elements such as characters, words and paragraphs. Inspired by the Bag
of Words (BoW) framework, we propose an efficient method for structural document matching by representing the document image as a histogram of key-region pairs encoding structural relationships.
Applied to the scenario of document image retrieval, experimental results demonstrate a remarkable improvement when comparing the proposed method with typical BoW and pyramidal BoW methods.
|
|
|
Hongxing Gao, Marçal Rusiñol, Dimosthenis Karatzas, Apostolos Antonacopoulos and Josep Llados. 2013. An interactive appearance-based document retrieval system for historical newspapers. Proceedings of the International Conference on Computer Vision Theory and Applications.84–87.
Abstract: In this paper we present a retrieval-based application aimed at assisting a user to semi-automatically segment an incoming flow of historical newspaper images by automatically detecting a particular type of pages based on their appearance. A visual descriptor is used to assess page similarity while a relevance feedback process allow refining the results iteratively. The application is tested on a large dataset of digitised historic newspapers.
|
|
|
Hongxing Gao. 2015. Focused Structural Document Image Retrieval in Digital Mailroom Applications. (Ph.D. thesis, Ediciones Graficas Rey.)
Abstract: In this work, we develop a generic framework that is able to handle the document retrieval problem in various scenarios such as searching for full page matches or retrieving the counterparts for specific document areas, focusing on their structural similarity or letting their visual resemblance to play a dominant role. Based on the spatial indexing technique, we propose to search for matches of local key-region pairs carrying both structural and visual information from the collection while a scheme allowing to adjust the relative contribution of structural and visual similarity is presented.
Based on the fact that the structure of documents is tightly linked with the distance among their elements, we firstly introduce an efficient detector named Distance Transform based Maximally Stable Extremal Regions (DTMSER). We illustrate that this detector is able to efficiently extract the structure of a document image as a dendrogram (hierarchical tree) of multi-scale key-regions that roughly correspond to letters, words and paragraphs. We demonstrate that, without benefiting from the structure information, the key-regions extracted by the DTMSER algorithm achieve better results comparing with state-of-the-art methods while much less amount of key-regions are employed.
We subsequently propose a pair-wise Bag of Words (BoW) framework to efficiently embed the explicit structure extracted by the DTMSER algorithm. We represent each document as a list of key-region pairs that correspond to the edges in the dendrogram where inclusion relationship is encoded. By employing those structural key-region pairs as the pooling elements for generating the histogram of features, the proposed method is able to encode the explicit inclusion relations into a BoW representation. The experimental results illustrate that the pair-wise BoW, powered by the embedded structural information, achieves remarkable improvement over the conventional BoW and spatial pyramidal BoW methods.
To handle various retrieval scenarios in one framework, we propose to directly query a series of key-region pairs, carrying both structure and visual information, from the collection. We introduce the spatial indexing techniques to the document retrieval community to speed up the structural relationship computation for key-region pairs. We firstly test the proposed framework in a full page retrieval scenario where structurally similar matches are expected. In this case, the pair-wise querying method achieves notable improvement over the BoW and spatial pyramidal BoW frameworks. Furthermore, we illustrate that the proposed method is also able to handle focused retrieval situations where the queries are defined as a specific interesting partial areas of the images. We examine our method on two types of focused queries: structure-focused and exact queries. The experimental results show that, the proposed generic framework obtains nearly perfect precision on both types of focused queries while it is the first framework able to tackle structure-focused queries, setting a new state of the art in the field.
Besides, we introduce a line verification method to check the spatial consistency among the matched key-region pairs. We propose a computationally efficient version of line verification through a two step implementation. We first compute tentative localizations of the query and subsequently employ them to divide the matched key-region pairs into several groups, then line verification is performed within each group while more precise bounding boxes are computed. We demonstrate that, comparing with the standard approach (based on RANSAC), the line verification proposed generally achieves much higher recall with slight loss on precision on specific queries.
|
|