|
Sounak Dey, Anjan Dutta, Juan Ignacio Toledo, Suman Ghosh, Josep Llados and Umapada Pal. 2018. SigNet: Convolutional Siamese Network for Writer Independent Offline Signature Verification.
Abstract: Offline signature verification is one of the most challenging tasks in biometrics and document forensics. Unlike other verification problems, it needs to model minute but critical details between genuine and forged signatures, because a skilled falsification might often resembles the real signature with small deformation. This verification task is even harder in writer independent scenarios which is undeniably fiscal for realistic cases. In this paper, we model an offline writer independent signature verification task with a convolutional Siamese network. Siamese networks are twin networks with shared weights, which can be trained to learn a feature space where similar observations are placed in proximity. This is achieved by exposing the network to a pair of similar and dissimilar observations and minimizing the Euclidean distance between similar pairs while simultaneously maximizing it between dissimilar pairs. Experiments conducted on cross-domain datasets emphasize the capability of our network to model forgery in different languages (scripts) and handwriting styles. Moreover, our designed Siamese network, named SigNet, exceeds the state-of-the-art results on most of the benchmark signature datasets, which paves the way for further research in this direction.
|
|
|
Albert Berenguel, Oriol Ramos Terrades, Josep Llados and Cristina Cañero. 2017. e-Counterfeit: a mobile-server platform for document counterfeit detection. 14th IAPR International Conference on Document Analysis and Recognition.
Abstract: This paper presents a novel application to detect counterfeit identity documents forged by a scan-printing operation. Texture analysis approaches are proposed to extract validation features from security background that is usually printed in documents as IDs or banknotes. The main contribution of this work is the end-to-end mobile-server architecture, which provides a service for non-expert users and therefore can be used in several scenarios. The system also provides a crowdsourcing mode so labeled images can be gathered, generating databases for incremental training of the algorithms.
|
|
|
Anjan Dutta, Josep Llados, Horst Bunke and Umapada Pal. 2018. Product graph-based higher order contextual similarities for inexact subgraph matching. PR, 76, 596–611.
Abstract: Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched. Pairwise measurements usually consider local attributes but disregard contextual information involved in graph structures. We address this issue by proposing contextual similarities between pairs of nodes. This is done by considering the tensor product graph (TPG) of two graphs to be matched, where each node is an ordered pair of nodes of the operand graphs. Contextual similarities between a pair of nodes are computed by accumulating weighted walks (normalized pairwise similarities) terminating at the corresponding paired node in TPG. Once the contextual similarities are obtained, we formulate subgraph matching as a node and edge selection problem in TPG. We use contextual similarities to construct an objective function and optimize it with a linear programming approach. Since random walk formulation through TPG takes into account higher order information, it is not a surprise that we obtain more reliable similarities and better discrimination among the nodes and edges. Experimental results shown on synthetic as well as real benchmarks illustrate that higher order contextual similarities increase discriminating power and allow one to find approximate solutions to the subgraph matching problem.
|
|
|
Suman Ghosh and Ernest Valveny. 2017. Visual attention models for scene text recognition. 14th International Conference on Document Analysis and Recognition.
Abstract: arXiv:1706.01487
In this paper we propose an approach to lexicon-free recognition of text in scene images. Our approach relies on a LSTM-based soft visual attention model learned from convolutional features. A set of feature vectors are derived from an intermediate convolutional layer corresponding to different areas of the image. This permits encoding of spatial information into the image representation. In this way, the framework is able to learn how to selectively focus on different parts of the image. At every time step the recognizer emits one character using a weighted combination of the convolutional feature vectors according to the learned attention model. Training can be done end-to-end using only word level annotations. In addition, we show that modifying the beam search algorithm by integrating an explicit language model leads to significantly better recognition results. We validate the performance of our approach on standard SVT and ICDAR'03 scene text datasets, showing state-of-the-art performance in unconstrained text recognition.
|
|
|
Suman Ghosh and Ernest Valveny. 2017. R-PHOC: Segmentation-Free Word Spotting using CNN. 14th International Conference on Document Analysis and Recognition.
Abstract: arXiv:1707.01294
This paper proposes a region based convolutional neural network for segmentation-free word spotting. Our network takes as input an image and a set of word candidate bound- ing boxes and embeds all bounding boxes into an embedding space, where word spotting can be casted as a simple nearest neighbour search between the query representation and each of the candidate bounding boxes. We make use of PHOC embedding as it has previously achieved significant success in segmentation- based word spotting. Word candidates are generated using a simple procedure based on grouping connected components using some spatial constraints. Experiments show that R-PHOC which operates on images directly can improve the current state-of- the-art in the standard GW dataset and performs as good as PHOCNET in some cases designed for segmentation based word spotting.
Keywords: Convolutional neural network; Image segmentation; Artificial neural network; Nearest neighbor search
|
|
|
Masakazu Iwamura, Naoyuki Morimoto, Keishi Tainaka, Dena Bazazian, Lluis Gomez and Dimosthenis Karatzas. 2017. ICDAR2017 Robust Reading Challenge on Omnidirectional Video. 14th International Conference on Document Analysis and Recognition.
Abstract: Results of ICDAR 2017 Robust Reading Challenge on Omnidirectional Video are presented. This competition uses Downtown Osaka Scene Text (DOST) Dataset that was captured in Osaka, Japan with an omnidirectional camera. Hence, it consists of sequential images (videos) of different view angles. Regarding the sequential images as videos (video mode), two tasks of localisation and end-to-end recognition are prepared. Regarding them as a set of still images (still image mode), three tasks of localisation, cropped word recognition and end-to-end recognition are prepared. As the dataset has been captured in Japan, the dataset contains Japanese text but also include text consisting of alphanumeric characters (Latin text). Hence, a submitted result for each task is evaluated in three ways: using Japanese only ground truth (GT), using Latin only GT and using combined GTs of both. Finally, by the submission deadline, we have received two submissions in the text localisation task of the still image mode. We intend to continue the competition in the open mode. Expecting further submissions, in this report we provide baseline results in all the tasks in addition to the submissions from the community.
|
|
|
Raul Gomez and 7 others. 2017. ICDAR2017 Robust Reading Challenge on COCO-Text. 14th International Conference on Document Analysis and Recognition.
|
|
|
Hana Jarraya, Oriol Ramos Terrades and Josep Llados. 2017. Learning structural loss parameters on graph embedding applied on symbolic graphs. 12th IAPR International Workshop on Graphics Recognition.
Abstract: We propose an amelioration of proposed Graph Embedding (GEM) method in previous work that takes advantages of structural pattern representation and the structured distortion. it models an Attributed Graph (AG) as a Probabilistic Graphical Model (PGM). Then, it learns the parameters of this PGM presented by a vector, as new signature of AG in a lower dimensional vectorial space. We focus to adapt the structured learning algorithm via 1_slack formulation with a suitable risk function, called Graph Edit Distance (GED). It defines the dissimilarity of the ground truth and predicted graph labels. It determines by the error tolerant graph matching using bipartite graph matching algorithm. We apply Structured Support Vector Machines (SSVM) to process classification task. During our experiments, we got our results on the GREC dataset.
|
|
|
Dimosthenis Karatzas, Lluis Gomez and Marçal Rusiñol. 2017. The Robust Reading Competition Annotation and Evaluation Platform. 1st International Workshop on Open Services and Tools for Document Analysis.
Abstract: The ICDAR Robust Reading Competition (RRC), initiated in 2003 and re-established in 2011, has become the defacto evaluation standard for the international community. Concurrent with its second incarnation in 2011, a continuous effort started to develop an online framework to facilitate the hosting and management of competitions. This short paper briefly outlines the Robust Reading Competition Annotation and Evaluation Platform, the backbone of the Robust Reading Competition, comprising a collection of tools and processes that aim to simplify the management and annotation
of data, and to provide online and offline performance evaluation and analysis services
|
|
|
Katerine Diaz, Jesus Martinez del Rincon, Aura Hernandez-Sabate, Marçal Rusiñol and Francesc J. Ferri. 2018. Fast Kernel Generalized Discriminative Common Vectors for Feature Extraction. JMIV, 60(4), 512–524.
Abstract: This paper presents a supervised subspace learning method called Kernel Generalized Discriminative Common Vectors (KGDCV), as a novel extension of the known Discriminative Common Vectors method with Kernels. Our method combines the advantages of kernel methods to model complex data and solve nonlinear
problems with moderate computational complexity, with the better generalization properties of generalized approaches for large dimensional data. These attractive combination makes KGDCV specially suited for feature extraction and classification in computer vision, image processing and pattern recognition applications. Two different approaches to this generalization are proposed, a first one based on the kernel trick (KT) and a second one based on the nonlinear projection trick (NPT) for even higher efficiency. Both methodologies
have been validated on four different image datasets containing faces, objects and handwritten digits, and compared against well known non-linear state-of-art methods. Results show better discriminant properties than other generalized approaches both linear or kernel. In addition, the KGDCV-NPT approach presents a considerable computational gain, without compromising the accuracy of the model.
|
|