Marçal Rusiñol, J. Chazalon, Jean-Marc Ogier, & Josep Llados. (2015). A Comparative Study of Local Detectors and Descriptors for Mobile Document Classification. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 596–600).
Abstract: In this paper we conduct a comparative study of local key-point detectors and local descriptors for the specific task of mobile document classification. A classification architecture based on direct matching of local descriptors is used as baseline for the comparative study. A set of four different key-point
detectors and four different local descriptors are tested in all the possible combinations. The experiments are conducted in a database consisting of 30 model documents acquired on 6 different backgrounds, totaling more than 36.000 test images.
|
Marçal Rusiñol, J. Chazalon, & Jean-Marc Ogier. (2014). Combining Focus Measure Operators to Predict OCR Accuracy in Mobile-Captured Document Images. In 11th IAPR International Workshop on Document Analysis and Systems (pp. 181–185).
Abstract: Mobile document image acquisition is a new trend raising serious issues in business document processing workflows. Such digitization procedure is unreliable, and integrates many distortions which must be detected as soon as possible, on the mobile, to avoid paying data transmission fees, and losing information due to the inability to re-capture later a document with temporary availability. In this context, out-of-focus blur is major issue: users have no direct control over it, and it seriously degrades OCR recognition. In this paper, we concentrate on the estimation of focus quality, to ensure a sufficient legibility of a document image for OCR processing. We propose two contributions to improve OCR accuracy prediction for mobile-captured document images. First, we present 24 focus measures, never tested on document images, which are fast to compute and require no training. Second, we show that a combination of those measures enables state-of-the art performance regarding the correlation with OCR accuracy. The resulting approach is fast, robust, and easy to implement in a mobile device. Experiments are performed on a public dataset, and precise details about image processing are given.
|
Marçal Rusiñol, J. Chazalon, & Jean-Marc Ogier. (2014). Normalisation et validation d'images de documents capturées en mobilité. In Colloque International Francophone sur l'Écrit et le Document (pp. 109–124).
Abstract: Mobile document image acquisition integrates many distortions which must be corrected or detected on the device, before the document becomes unavailable or paying data transmission fees. In this paper, we propose a system to correct perspective and illumination issues, and estimate the sharpness of the image for OCR recognition. The correction step relies on fast and accurate border detection followed by illumination normalization. Its evaluation on a private dataset shows a clear improvement on OCR accuracy. The quality assessment
step relies on a combination of focus measures. Its evaluation on a public dataset shows that this simple method compares well to state of the art, learning-based methods which cannot be embedded on a mobile, and outperforms metric-based methods.
Keywords: mobile document image acquisition; perspective correction; illumination correction; quality assessment; focus measure; OCR accuracy prediction
|
Marçal Rusiñol, J. Chazalon, & Jean-Marc Ogier. (2016). Filtrage de descripteurs locaux pour l'amélioration de la détection de documents. In Colloque International Francophone sur l'Écrit et le Document.
Abstract: In this paper we propose an effective method aimed at reducing the amount of local descriptors to be indexed in a document matching framework.In an off-line training stage, the matching between the model document and incoming images is computed retaining the local descriptors from the model that steadily produce good matches. We have evaluated this approach by using the ICDAR2015 SmartDOC dataset containing near 25000 images from documents to be captured by a mobile device. We have tested the performance of this filtering step by using ORB and SIFT local detectors and descriptors. The results show an important gain both in quality of the final matching as well as in time and space requirements.
Keywords: Local descriptors; mobile capture; document matching; keypoint selection
|
Marçal Rusiñol, Farshad Nourbakhsh, Dimosthenis Karatzas, Ernest Valveny, & Josep Llados. (2010). Perceptual Image Retrieval by Adding Color Information to the Shape Context Descriptor. In 20th International Conference on Pattern Recognition (1594–1597).
Abstract: In this paper we present a method for the retrieval of images in terms of perceptual similarity. Local color information is added to the shape context descriptor in order to obtain an object description integrating both shape and color as visual cues. We use a color naming algorithm in order to represent the color information from a perceptual point of view. The proposed method has been tested in two different applications, an object retrieval scenario based on color sketch queries and a color trademark retrieval problem. Experimental results show that the addition of the color information significantly outperforms the sole use of the shape context descriptor.
|
Marçal Rusiñol, Dimosthenis Karatzas, & Josep Llados. (2013). Spotting Graphical Symbols in Camera-Acquired Documents in Real Time. In 10th IAPR International Workshop on Graphics Recognition.
Abstract: In this paper we present a system devoted to spot graphical symbols in camera-acquired document images. The system is based on the extraction and further matching of ORB compact local features computed over interest key-points. Then, the FLANN indexing framework based on approximate nearest neighbor search allows to efficiently match local descriptors between the captured scene and the graphical models. Finally, the RANSAC algorithm is used in order to compute the homography between the spotted symbol and its appearance in the document image. The proposed approach is efficient and is able to work in real time.
|
Marçal Rusiñol, Dimosthenis Karatzas, & Josep Llados. (2014). Spotting Graphical Symbols in Camera-Acquired Documents in Real Time. In Bart Lamiroy, & Jean-Marc Ogier (Eds.), Graphics Recognition. Current Trends and Challenges (Vol. 8746, pp. 3–10). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we present a system devoted to spot graphical symbols in camera-acquired document images. The system is based on the extraction and further matching of ORB compact local features computed over interest key-points. Then, the FLANN indexing framework based on approximate nearest neighbor search allows to efficiently match local descriptors between the captured scene and the graphical models. Finally, the RANSAC algorithm is used in order to compute the homography between the spotted symbol and its appearance in the document image. The proposed approach is efficient and is able to work in real time.
|
Marçal Rusiñol, Dimosthenis Karatzas, & Josep Llados. (2015). Automatic Verification of Properly Signed Multi-page Document Images. In Proceedings of the Eleventh International Symposium on Visual Computing (Vol. 9475, pp. 327–336). LNCS, 9475.
Abstract: In this paper we present an industrial application for the automatic screening of incoming multi-page documents in a banking workflow aimed at determining whether these documents are properly signed or not. The proposed method is divided in three main steps. First individual pages are classified in order to identify the pages that should contain a signature. In a second step, we segment within those key pages the location where the signatures should appear. The last step checks whether the signatures are present or not. Our method is tested in a real large-scale environment and we report the results when checking two different types of real multi-page contracts, having in total more than 14,500 pages.
Keywords: Document Image; Manual Inspection; Signature Verification; Rejection Criterion; Document Flow
|
Marçal Rusiñol, Dimosthenis Karatzas, Andrew Bagdanov, & Josep Llados. (2012). Multipage Document Retrieval by Textual and Visual Representations. In 21st International Conference on Pattern Recognition (pp. 521–524).
Abstract: In this paper we present a multipage administrative document image retrieval system based on textual and visual representations of document pages. Individual pages are represented by textual or visual information using a bag-of-words framework. Different fusion strategies are evaluated which allow the system to perform multipage document retrieval on the basis of a single page retrieval system. Results are reported on a large dataset of document images sampled from a banking workflow.
|
Marçal Rusiñol, David Aldavert, Ricardo Toledo, & Josep Llados. (2011). Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method. In 11th International Conference on Document Analysis and Recognition (pp. 63–67).
Abstract: In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts.
|
Marçal Rusiñol, David Aldavert, Ricardo Toledo, & Josep Llados. (2015). Efficient segmentation-free keyword spotting in historical document collections. PR - Pattern Recognition, 48(2), 545–555.
Abstract: In this paper we present an efficient segmentation-free word spotting method, applied in the context of historical document collections, that follows the query-by-example paradigm. We use a patch-based framework where local patches are described by a bag-of-visual-words model powered by SIFT descriptors. By projecting the patch descriptors to a topic space with the latent semantic analysis technique and compressing the descriptors with the product quantization method, we are able to efficiently index the document information both in terms of memory and time. The proposed method is evaluated using four different collections of historical documents achieving good performances on both handwritten and typewritten scenarios. The yielded performances outperform the recent state-of-the-art keyword spotting approaches.
Keywords: Historical documents; Keyword spotting; Segmentation-free; Dense SIFT features; Latent semantic analysis; Product quantization
|
Marçal Rusiñol, David Aldavert, Ricardo Toledo, & Josep Llados. (2015). Towards Query-by-Speech Handwritten Keyword Spotting. In 13th International Conference on Document Analysis and Recognition ICDAR2015 (pp. 501–505).
Abstract: In this paper, we present a new querying paradigm for handwritten keyword spotting. We propose to represent handwritten word images both by visual and audio representations, enabling a query-by-speech keyword spotting system. The two representations are merged together and projected to a common sub-space in the training phase. This transform allows to, given a spoken query, retrieve word instances that were only represented by the visual modality. In addition, the same method can be used backwards at no additional cost to produce a handwritten text-tospeech system. We present our first results on this new querying mechanism using synthetic voices over the George Washington
dataset.
|
Marçal Rusiñol, David Aldavert, Dimosthenis Karatzas, Ricardo Toledo, & Josep Llados. (2011). Interactive Trademark Image Retrieval by Fusing Semantic and Visual Content. Advances in Information Retrieval. In P. Clough, C. Foley, C. Gurrin, G.J.F. Jones, W. Kraaij, H. Lee, et al. (Eds.), 33rd European Conference on Information Retrieval (Vol. 6611, pp. 314–325). LNCS. Berlin: Springer.
Abstract: In this paper we propose an efficient queried-by-example retrieval system which is able to retrieve trademark images by similarity from patent and trademark offices' digital libraries. Logo images are described by both their semantic content, by means of the Vienna codes, and their visual contents, by using shape and color as visual cues. The trademark descriptors are then indexed by a locality-sensitive hashing data structure aiming to perform approximate k-NN search in high dimensional spaces in sub-linear time. The resulting ranked lists are combined by using the Condorcet method and a relevance feedback step helps to iteratively revise the query and refine the obtained results. The experiments demonstrate the effectiveness and efficiency of this system on a realistic and large dataset.
|
Marçal Rusiñol, Agnes Borras, & Josep Llados. (2010). Relational Indexing of Vectorial Primitives for Symbol Spotting in Line-Drawing Images. PRL - Pattern Recognition Letters, 31(3), 188–201.
Abstract: This paper presents a symbol spotting approach for indexing by content a database of line-drawing images. As line-drawings are digital-born documents designed by vectorial softwares, instead of using a pixel-based approach, we present a spotting method based on vector primitives. Graphical symbols are represented by a set of vectorial primitives which are described by an off-the-shelf shape descriptor. A relational indexing strategy aims to retrieve symbol locations into the target documents by using a combined numerical-relational description of 2D structures. The zones which are likely to contain the queried symbol are validated by a Hough-like voting scheme. In addition, a performance evaluation framework for symbol spotting in graphical documents is proposed. The presented methodology has been evaluated with a benchmarking set of architectural documents achieving good performance results.
Keywords: Document image analysis and recognition, Graphics recognition, Symbol spotting ,Vectorial representations, Line-drawings
|
Marçal Rusiñol. (2006). A Model of Vectorial Signatures in Terms of Expressive Sub-Shapes: Symbol Indexation in Technical Documents.
|