|
Beata Megyesi, Bernhard Esslinger, Alicia Fornes, Nils Kopal, Benedek Lang, George Lasry, et al. (2020). Decryption of historical manuscripts: the DECRYPT project. CRYPT - Cryptologia, 44(6), 545–559.
Abstract: Many historians and linguists are working individually and in an uncoordinated fashion on the identification and decryption of historical ciphers. This is a time-consuming process as they often work without access to automatic methods and processes that can accelerate the decipherment. At the same time, computer scientists and cryptologists are developing algorithms to decrypt various cipher types without having access to a large number of original ciphertexts. In this paper, we describe the DECRYPT project aiming at the creation of resources and tools for historical cryptology by bringing the expertise of various disciplines together for collecting data, exchanging methods for faster progress to transcribe, decrypt and contextualize historical encrypted manuscripts. We present our goals and work-in progress of a general approach for analyzing historical encrypted manuscripts using standardized methods and a new set of state-of-the-art tools. We release the data and tools as open-source hoping that all mentioned disciplines would benefit and contribute to the research infrastructure of historical cryptology.
Keywords: automatic decryption; cipher collection; historical cryptology; image transcription
|
|
|
Miquel Ferrer, Dimosthenis Karatzas, Ernest Valveny, I. Bardaji, & Horst Bunke. (2011). A Generic Framework for Median Graph Computation based on a Recursive Embedding Approach. CVIU - Computer Vision and Image Understanding, 115(7), 919–928.
Abstract: The median graph has been shown to be a good choice to obtain a represen- tative of a set of graphs. However, its computation is a complex problem. Recently, graph embedding into vector spaces has been proposed to obtain approximations of the median graph. The problem with such an approach is how to go from a point in the vector space back to a graph in the graph space. The main contribution of this paper is the generalization of this previ- ous method, proposing a generic recursive procedure that permits to recover the graph corresponding to a point in the vector space, introducing only the amount of approximation inherent to the use of graph matching algorithms. In order to evaluate the proposed method, we compare it with the set me- dian and with the other state-of-the-art embedding-based methods for the median graph computation. The experiments are carried out using four dif- ferent databases (one semi-artificial and three containing real-world data). Results show that with the proposed approach we can obtain better medi- ans, in terms of the sum of distances to the training graphs, than with the previous existing methods.
Keywords: Median Graph, Graph Embedding, Graph Matching, Structural Pattern Recognition
|
|
|
Albert Gordo, Florent Perronnin, & Ernest Valveny. (2013). Large-scale document image retrieval and classification with runlength histograms and binary embeddings. PR - Pattern Recognition, 46(7), 1898–1905.
Abstract: We present a new document image descriptor based on multi-scale runlength
histograms. This descriptor does not rely on layout analysis and can be
computed efficiently. We show how this descriptor can achieve state-of-theart
results on two very different public datasets in classification and retrieval
tasks. Moreover, we show how we can compress and binarize these descriptors
to make them suitable for large-scale applications. We can achieve state-ofthe-
art results in classification using binary descriptors of as few as 16 to 64
bits.
Keywords: visual document descriptor; compression; large-scale; retrieval; classification
|
|
|
Jose Antonio Rodriguez, Florent Perronnin, Gemma Sanchez, & Josep Llados. (2010). Unsupervised writer adaptation of whole-word HMMs with application to word-spotting. PRL - Pattern Recognition Letters, 31(8), 742–749.
Abstract: In this paper we propose a novel approach for writer adaptation in a handwritten word-spotting task. The method exploits the fact that the semi-continuous hidden Markov model separates the word model parameters into (i) a codebook of shapes and (ii) a set of word-specific parameters.
Our main contribution is to employ this property to derive writer-specific word models by statistically adapting an initial universal codebook to each document. This process is unsupervised and does not even require the appearance of the keyword(s) in the searched document. Experimental results show an increase in performance when this adaptation technique is applied. To the best of our knowledge, this is the first work dealing with adaptation for word-spotting. The preliminary version of this paper obtained an IBM Best Student Paper Award at the 19th International Conference on Pattern Recognition.
Keywords: Word-spotting; Handwriting recognition; Writer adaptation; Hidden Markov model; Document analysis
|
|
|
Palaiahnakote Shivakumara, Anjan Dutta, Trung Quy Phan, Chew Lim Tan, & Umapada Pal. (2011). A Novel Mutual Nearest Neighbor based Symmetry for Text Frame Classification in Video. PR - Pattern Recognition, 44(8), 1671–1683.
Abstract: In the field of multimedia retrieval in video, text frame classification is essential for text detection, event detection, event boundary detection, etc. We propose a new text frame classification method that introduces a combination of wavelet and median moment with k-means clustering to select probable text blocks among 16 equally sized blocks of a video frame. The same feature combination is used with a new Max–Min clustering at the pixel level to choose probable dominant text pixels in the selected probable text blocks. For the probable text pixels, a so-called mutual nearest neighbor based symmetry is explored with a four-quadrant formation centered at the centroid of the probable dominant text pixels to know whether a block is a true text block or not. If a frame produces at least one true text block then it is considered as a text frame otherwise it is a non-text frame. Experimental results on different text and non-text datasets including two public datasets and our own created data show that the proposed method gives promising results in terms of recall and precision at the block and frame levels. Further, we also show how existing text detection methods tend to misclassify non-text frames as text frames in term of recall and precision at both the block and frame levels.
|
|