|
Lluis Pere de las Heras, Ernest Valveny, & Gemma Sanchez. (2013). Unsupervised and Notation-Independent Wall Segmentation in Floor Plans Using a Combination of Statistical and Structural Strategies. In 10th IAPR International Workshop on Graphics Recognition.
|
|
|
Pau Riba, Josep Llados, Alicia Fornes, & Anjan Dutta. (2015). Large-scale Graph Indexing using Binary Embeddings of Node Contexts. In C.-L.Liu, B.Luo, W.G.Kropatsch, & J.Cheng (Eds.), 10th IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition (Vol. 9069, pp. 208–217). LNCS. Springer International Publishing.
Abstract: Graph-based representations are experiencing a growing usage in visual recognition and retrieval due to their representational power in front of classical appearance-based representations in terms of feature vectors. Retrieving a query graph from a large dataset of graphs has the drawback of the high computational complexity required to compare the query and the target graphs. The most important property for a large-scale retrieval is the search time complexity to be sub-linear in the number of database examples. In this paper we propose a fast indexation formalism for graph retrieval. A binary embedding is defined as hashing keys for graph nodes. Given a database of labeled graphs, graph nodes are complemented with vectors of attributes representing their local context. Hence, each attribute counts the length of a walk of order k originated in a vertex with label l. Each attribute vector is converted to a binary code applying a binary-valued hash function. Therefore, graph retrieval is formulated in terms of finding target graphs in the database whose nodes have a small Hamming distance from the query nodes, easily computed with bitwise logical operators. As an application example, we validate the performance of the proposed methods in a handwritten word spotting scenario in images of historical documents.
Keywords: Graph matching; Graph indexing; Application in document analysis; Word spotting; Binary embedding
|
|
|
Nil Ballus, Bhalaji Nagarajan, & Petia Radeva. (2022). Opt-SSL: An Enhanced Self-Supervised Framework for Food Recognition. In 10th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 13256). LNCS.
Abstract: Self-supervised Learning has been showing upbeat performance in several computer vision tasks. The popular contrastive methods make use of a Siamese architecture with different loss functions. In this work, we go deeper into two very recent state of the art frameworks, namely, SimSiam and Barlow Twins. Inspired by them, we propose a new self-supervised learning method we call Opt-SSL that combines both image and feature contrasting. We validate the proposed method on the food recognition task, showing that our proposed framework enables the self-learning networks to learn better visual representations.
Keywords: Self-supervised; Contrastive learning; Food recognition
|
|
|
Xavier Baro, Sergio Escalera, Petia Radeva, & Jordi Vitria. (2009). Visual Content Layer for Scalable Recognition in Urban Image Databases, Internet Multimedia Search and Mining. In 10th IEEE International Conference on Multimedia and Expo (1616–1619).
Abstract: Rich online map interaction represents a useful tool to get multimedia information related to physical places. With this type of systems, users can automatically compute the optimal route for a trip or to look for entertainment places or hotels near their actual position. Standard maps are defined as a fusion of layers, where each one contains specific data such height, streets, or a particular business location. In this paper we propose the construction of a visual content layer which describes the visual appearance of geographic locations in a city. We captured, by means of a Mobile Mapping system, a huge set of georeferenced images (> 500K) which cover the whole city of Barcelona. For each image, hundreds of region descriptions are computed off-line and described as a hash code. This allows an efficient and scalable way of accessing maps by visual content.
|
|
|
D. Jayagopi, Bogdan Raducanu, & D. Gatica-Perez. (2009). Characterizing conversational group dynamics using nonverbal behaviour. In 10th IEEE International Conference on Multimedia and Expo (370–373).
Abstract: This paper addresses the novel problem of characterizing conversational group dynamics. It is well documented in social psychology that depending on the objectives a group, the dynamics are different. For example, a competitive meeting has a different objective from that of a collaborative meeting. We propose a method to characterize group dynamics based on the joint description of a group members' aggregated acoustical nonverbal behaviour to classify two meeting datasets (one being cooperative-type and the other being competitive-type). We use 4.5 hours of real behavioural multi-party data and show that our methodology can achieve a classification rate of upto 100%.
|
|
|
Pau Baiget, Joan Soto, Xavier Roca, & Jordi Gonzalez. (2007). Automatic Generation of Computer-Animated Sequences based on Human Behaviour Modelling. In 10th International Conference on Computer Graphics and Artificial Intelligence.
|
|
|
Antonio Clavelli, & Dimosthenis Karatzas. (2009). Text Segmentation in Colour Posters from the Spanish Civil War Era. In 10th International Conference on Document Analysis and Recognition (pp. 181–185).
Abstract: The extraction of textual content from colour documents of a graphical nature is a complicated task. The text can be rendered in any colour, size and orientation while the existence of complex background graphics with repetitive patterns can make its localization and segmentation extremely difficult.
Here, we propose a new method for extracting textual content from such colour images that makes no assumption as to the size of the characters, their orientation or colour, while it is tolerant to characters that do not follow a straight baseline. We evaluate this method on a collection of documents with historical
connotations: the Posters from the Spanish Civil War.
|
|
|
Albert Gordo, & Ernest Valveny. (2009). A rotation invariant page layout descriptor for document classification and retrieval. In 10th International Conference on Document Analysis and Recognition (481–485).
Abstract: Document classification usually requires of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic dynamic time warping which can be computed in O(n2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona archives database and compared against another common layout distance, the minimum weight edge cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents.
|
|
|
Marçal Rusiñol, & Josep Llados. (2009). Logo Spotting by a Bag-of-words Approach for Document Categorization. In 10th International Conference on Document Analysis and Recognition (111–115).
Abstract: In this paper we present a method for document categorization which processes incoming document images such as invoices or receipts. The categorization of these document images is done in terms of the presence of a certain graphical logo detected without segmentation. The graphical logos are described by a set of local features and the categorization of the documents is performed by the use of a bag-of-words model. Spatial coherence rules are added to reinforce the correct category hypothesis, aiming also to spot the logo inside the document image. Experiments which demonstrate the effectiveness of this system on a large set of real data are presented.
|
|
|
Ricard Coll, Alicia Fornes, & Josep Llados. (2009). Graphological Analysis of Handwritten Text Documents for Human Resources Recruitment. In 10th International Conference on Document Analysis and Recognition (1081–1085).
Abstract: The use of graphology in recruitment processes has become a popular tool in many human resources companies. This paper presents a model that links features from handwritten images to a number of personality characteristics used to measure applicant aptitudes for the job in a particular hiring scenario. In particular we propose a model of measuring active personality and leadership of the writer. Graphological features that define such a profile are measured in terms of document and script attributes like layout configuration, letter size, shape, slant and skew angle of lines, etc. After the extraction, data is classified using a neural network. An experimental framework with real samples has been constructed to illustrate the performance of the approach.
|
|
|
Alicia Fornes, Josep Llados, Gemma Sanchez, & Horst Bunke. (2009). On the use of textural features for writer identification in old handwritten music scores. In 10th International Conference on Document Analysis and Recognition (pp. 996–1000).
Abstract: Writer identification consists in determining the writer of a piece of handwriting from a set of writers. In this paper we present a system for writer identification in old handwritten music scores which uses only music notation to determine the author. The steps of the proposed system are the following. First of all, the music sheet is preprocessed for obtaining a music score without the staff lines. Afterwards, four different methods for generating texture images from music symbols are applied. Every approach uses a different spatial variation when combining the music symbols to generate the textures. Finally, Gabor filters and Grey-scale Co-ocurrence matrices are used to obtain the features. The classification is performed using a k-NN classifier based on Euclidean distance. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving encouraging identification rates.
|
|
|
Partha Pratim Roy, Umapada Pal, & Josep Llados. (2009). Seal detection and recognition: An approach for document indexing. In 10th International Conference on Document Analysis and Recognition (101–105).
Abstract: Reliable indexing of documents having seal instances can be achieved by recognizing seal information. This paper presents a novel approach for detecting and classifying such multi-oriented seals in these documents. First, Hough Transform based methods are applied to extract the seal regions in documents. Next, isolated text characters within these regions are detected. Rotation and size invariant features and a support vector machine based classifier have been used to recognize these detected text characters. Next, for each pair of character, we encode their relative spatial organization using their distance and angular position with respect to the centre of the seal, and enter this code into a hash table. Given an input seal, we recognize the individual text characters and compute the code for pair-wise character based on the relative spatial organization. The code obtained from the input seal helps to retrieve model hypothesis from the hash table. The seal model to which we get maximum hypothesis is selected for the recognition of the input seal. The methodology is tested to index seal in rotation and size invariant environment and we obtained encouraging results.
|
|
|
Partha Pratim Roy, Umapada Pal, Josep Llados, & Mathieu Nicolas Delalandre. (2009). Multi-Oriented and Multi-Sized Touching Character Segmentation using Dynamic Programming. In 10th International Conference on Document Analysis and Recognition (11–15).
Abstract: In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region at the background portion. Using Convex Hull information, we use these background information to find some initial points to segment a touching string into possible primitive segments (a primitive segment consists of a single character or a part of a character). Next these primitive segments are merged to get optimum segmentation and dynamic programming is applied using total likelihood of characters as the objective function. SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Circular ring and convex hull ring based approach has been used along with angular information of the contour pixels of the character to make the feature rotation invariant. From the experiment, we obtained encouraging results.
|
|
|
D. Perez, L. Tarazon, N. Serrano, F.M. Castro, Oriol Ramos Terrades, & A. Juan. (2009). The GERMANA Database. In 10th International Conference on Document Analysis and Recognition (pp. 301–305).
Abstract: A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. GERMANA is the result of digitising and annotating a 764-page Spanish manuscript from 1891, in which most pages only contain nearly calligraphed text written on ruled sheets of well-separated lines. To our knowledge, it is the first publicly available database for handwriting research, mostly written in Spanish and comparable in size to standard databases. Due to its sequential book structure, it is also well-suited for realistic assessment of interactive handwriting recognition systems. To provide baseline results for reference in future studies, empirical results are also reported, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.
|
|
|
Gioacchino Vino, & Angel Sappa. (2013). Revisiting Harris Corner Detector Algorithm: a Gradual Thresholding Approach. In 10th International Conference on Image Analysis and Recognition (Vol. 7950, pp. 354–363). LNCS. Springer Berlin Heidelberg.
Abstract: This paper presents an adaptive thresholding approach intended to increase the number of detected corners, while reducing the amount of those ones corresponding to noisy data. The proposed approach works by using the classical Harris corner detector algorithm and overcome the difficulty in finding a general threshold that work well for all the images in a given data set by proposing a novel adaptive thresholding scheme. Initially, two thresholds are used to discern between strong corners and flat regions. Then, a region based criteria is used to discriminate between weak corners and noisy points in the midway interval. Experimental results show that the proposed approach has a better capability to reject false corners and, at the same time, to detect weak ones. Comparisons with the state of the art are provided showing the validity of the proposed approach.
|
|