|
Josep Llados and Gemma Sanchez. 2007. Indexing Historical Documents by Word Shape Signatures. 9th International Conference on Document Analysis and Recognition.362–366.
|
|
|
Jose Antonio Rodriguez, Gemma Sanchez and Josep Llados. 2007. A Pen-based Interface for Real-time Document Edition. 9th International Conference on Document Analysis and Recognition..939–944.
|
|
|
Oriol Ramos Terrades, Salvatore Tabbone and Ernest Valveny. 2007. A Review of Shape Descriptors for Document Analysis. 9th International Conference on Document Analysis and Recognition.227–231.
|
|
|
S. Chanda, Oriol Ramos Terrades and Umapada Pal. 2007. SVM Based Scheme for Thai and English Script Identification. 9th International Conference on Document Analysis and Recognition.551–555.
|
|
|
Antonio Clavelli and Dimosthenis Karatzas. 2009. Text Segmentation in Colour Posters from the Spanish Civil War Era. 10th International Conference on Document Analysis and Recognition.181–185.
Abstract: The extraction of textual content from colour documents of a graphical nature is a complicated task. The text can be rendered in any colour, size and orientation while the existence of complex background graphics with repetitive patterns can make its localization and segmentation extremely difficult.
Here, we propose a new method for extracting textual content from such colour images that makes no assumption as to the size of the characters, their orientation or colour, while it is tolerant to characters that do not follow a straight baseline. We evaluate this method on a collection of documents with historical
connotations: the Posters from the Spanish Civil War.
|
|
|
Albert Gordo and Ernest Valveny. 2009. A rotation invariant page layout descriptor for document classification and retrieval. 10th International Conference on Document Analysis and Recognition.481–485.
Abstract: Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
|
|
|
Marçal Rusiñol and Josep Llados. 2009. Logo Spotting by a Bag-of-words Approach for Document Categorization. 10th International Conference on Document Analysis and Recognition.111–115.
Abstract: In this paper we present a method for document categorization which processes incoming document images such as invoices or receipts. The categorization of these document images is done in terms of the presence of a certain graphical logo detected without segmentation. The graphical logos are described by a set of local features and the categorization of the documents is performed by the use of a bag-of-words model. Spatial coherence rules are added to reinforce the correct category hypothesis, aiming also to spot the logo inside the document image. Experiments which demonstrate the effectiveness of this system on a large set of real data are presented.
|
|
|
Ricard Coll, Alicia Fornes and Josep Llados. 2009. Graphological Analysis of Handwritten Text Documents for Human Resources Recruitment. 10th International Conference on Document Analysis and Recognition.1081–1085.
Abstract: The use of graphology in recruitment processes has become a popular tool in many human resources companies. This paper presents a model that links features from handwritten images to a number of personality characteristics used to measure applicant aptitudes for the job in a particular hiring scenario. In particular we propose a model of measuring active personality and leadership of the writer. Graphological features that define such a profile are measured in terms of document and script attributes like layout configuration, letter size, shape, slant and skew angle of lines, etc. After the extraction, data is classified using a neural network. An experimental framework with real samples has been constructed to illustrate the performance of the approach.
|
|
|
Alicia Fornes, Josep Llados, Gemma Sanchez and Horst Bunke. 2009. On the use of textural features for writer identification in old handwritten music scores. 10th International Conference on Document Analysis and Recognition.996–1000.
Abstract: Writer identification consists in determining the writer of a piece of handwriting from a set of writers. In this paper we present a system for writer identification in old handwritten music scores which uses only music notation to determine the author. The steps of the proposed system are the following. First of all, the music sheet is preprocessed for obtaining a music score without the staff lines. Afterwards, four different methods for generating texture images from music symbols are applied. Every approach uses a different spatial variation when combining the music symbols to generate the textures. Finally, Gabor filters and Grey-scale Co-ocurrence matrices are used to obtain the features. The classification is performed using a k-NN classifier based on Euclidean distance. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving encouraging identification rates.
|
|
|
Partha Pratim Roy, Umapada Pal and Josep Llados. 2009. Seal detection and recognition: An approach for document indexing. 10th International Conference on Document Analysis and Recognition.101–105.
Abstract: Reliable indexing of documents having seal instances can be achieved by recognizing seal information. This paper presents a novel approach for detecting and classifying such multi-oriented seals in these documents. First, Hough Transform based methods are applied to extract the seal regions in documents. Next, isolated text characters within these regions are detected. Rotation and size invariant features and a support vector machine based classifier have been used to recognize these detected text characters. Next, for each pair of character, we encode their relative spatial organization using their distance and angular position with respect to the centre of the seal, and enter this code into a hash table. Given an input seal, we recognize the individual text characters and compute the code for pair-wise character based on the relative spatial organization. The code obtained from the input seal helps to retrieve model hypothesis from the hash table. The seal model to which we get maximum hypothesis is selected for the recognition of the input seal. The methodology is tested to index seal in rotation and size invariant environment and we obtained encouraging results.
|
|