Albert Gordo, Alicia Fornes, Ernest Valveny, & Josep Llados. (2010). A Bag of Notes Approach to Writer Identification in Old Handwritten Music Scores. In 9th IAPR International Workshop on Document Analysis Systems (247–254).
Abstract: Determining the authorship of a document, namely writer identification, can be an important source of information for document categorization. Contrary to text documents, the identification of the writer of graphical documents is still a challenge. In this paper we present a robust approach for writer identification in a particular kind of graphical documents, old music scores. This approach adapts the bag of visual terms method for coping with graphic documents. The identification is performed only using the graphical music notation. For this purpose, we generate a graphic vocabulary without recognizing any music symbols, and consequently, avoiding the difficulties in the recognition of hand-drawn symbols in old and degraded documents. The proposed method has been tested on a database of old music scores from the 17th to 19th centuries, achieving very high identification rates.
|
Albert Gordo, Jaume Gibert, Ernest Valveny, & Marçal Rusiñol. (2010). A Kernel-based Approach to Document Retrieval. In 9th IAPR International Workshop on Document Analysis Systems (377–384).
Abstract: In this paper we tackle the problem of document image retrieval by combining a similarity measure between documents and the probability that a given document belongs to a certain class. The membership probability to a specific class is computed using Support Vector Machines in conjunction with similarity measure based kernel applied to structural document representations. In the presented experiments, we use different document representations, both visual and structural, and we apply them to a database of historical documents. We show how our method based on similarity kernels outperforms the usual distance-based retrieval.
|
Antonio Clavelli, Dimosthenis Karatzas, & Josep Llados. (2010). A framework for the assessment of text extraction algorithms on complex colour images. In 9th IAPR International Workshop on Document Analysis Systems (19–26).
Abstract: The availability of open, ground-truthed datasets and clear performance metrics is a crucial factor in the development of an application domain. The domain of colour text image analysis (real scenes, Web and spam images, scanned colour documents) has traditionally suffered from a lack of a comprehensive performance evaluation framework. Such a framework is extremely difficult to specify, and corresponding pixel-level accurate information tedious to define. In this paper we discuss the challenges and technical issues associated with developing such a framework. Then, we describe a complete framework for the evaluation of text extraction methods at multiple levels, provide a detailed ground-truth specification and present a case study on how this framework can be used in a real-life situation.
|
Partha Pratim Roy, Umapada Pal, & Josep Llados. (2010). Query Driven Word Retrieval in Graphical Documents. In 9th IAPR International Workshop on Document Analysis Systems (191–198).
Abstract: In this paper, we present an approach towards the retrieval of words from graphical document images. In graphical documents, due to presence of multi-oriented characters in non-structured layout, word indexing is a challenging task. The proposed approach uses recognition results of individual components to form character pairs with the neighboring components. An indexing scheme is designed to store the spatial description of components and to access them efficiently. Given a query text word (ascii/unicode format), the character pairs present in it are searched in the document. Next the retrieved character pairs are linked sequentially to form character string. Dynamic programming is applied to find different instances of query words. A string edit distance is used here to match the query word as the objective function. Recognition of multi-scale and multi-oriented character component is done using Support Vector Machine classifier. To consider multi-oriented character strings the features used in the SVM are invariant to character orientation. Experimental results show that the method is efficient to locate a query word from multi-oriented text in graphical documents.
|
Farshad Nourbakhsh, Dimosthenis Karatzas, & Ernest Valveny. (2010). A polar-based logo representation based on topological and colour features. In 9th IAPR International Workshop on Document Analysis Systems (341–348).
Abstract: In this paper, we propose a novel rotation and scale invariant method for colour logo retrieval and classification, which involves performing a simple colour segmentation and subsequently describing each of the resultant colour components based on a set of topological and colour features. A polar representation is used to represent the logo and the subsequent logo matching is based on Cyclic Dynamic Time Warping (CDTW). We also show how combining information about the global distribution of the logo components and their local neighbourhood using the Delaunay triangulation allows to improve the results. All experiments are performed on a dataset of 2500 instances of 100 colour logo images in different rotations and scales.
|
Sebastien Mace, Herve Locteau, Ernest Valveny, & Salvatore Tabbone. (2010). A system to detect rooms in architectural floor plan images. In 9th IAPR International Workshop on Document Analysis Systems (167–174).
Abstract: In this article, a system to detect rooms in architectural floor plan images is described. We first present a primitive extraction algorithm for line detection. It is based on an original coupling of classical Hough transform with image vectorization in order to perform robust and efficient line detection. We show how the lines that satisfy some graphical arrangements are combined into walls. We also present the way we detect some door hypothesis thanks to the extraction of arcs. Walls and door hypothesis are then used by our room segmentation strategy; it consists in recursively decomposing the image until getting nearly convex regions. The notion of convexity is difficult to quantify, and the selection of separation lines between regions can also be rough. We take advantage of knowledge associated to architectural floor plans in order to obtain mostly rectangular rooms. Qualitative and quantitative evaluations performed on a corpus of real documents show promising results.
|
Sergio Escalera, Xavier Baro, Jordi Vitria, & Petia Radeva. (2009). Text Detection in Urban Scenes (video sample). In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, 35–44).
Abstract: Abstract. Text detection in urban scenes is a hard task due to the high variability of text appearance: different text fonts, changes in the point of view, or partial occlusion are just a few problems. Text detection can be specially suited for georeferencing business, navigation, tourist assistance, or to help visual impaired people. In this paper, we propose a general methodology to deal with the problem of text detection in outdoor scenes. The method is based on learning spatial information of gradient based features and Census Transform images using a cascade of classifiers. The method is applied in the context of Mobile Mapping systems, where a mobile vehicle captures urban image sequences. Moreover, a cover data set is presented and tested with the new methodology. The results show high accuracy when detecting multi-linear text regions with high variability of appearance, at same time that it preserves a low false alarm rate compared to classical approaches
|
Sergio Escalera, Oriol Pujol, Petia Radeva, & Jordi Vitria. (2009). Measuring Interest of Human Dyadic Interactions. In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, pp. 45–54).
Abstract: In this paper, we argue that only using behavioural motion information, we are able to predict the interest of observers when looking at face-to-face interactions. We propose a set of movement-related features from body, face, and mouth activity in order to define a set of higher level interaction features, such as stress, activity, speaking engagement, and corporal engagement. Error-Correcting Output Codes framework with an Adaboost base classifier is used to learn to rank the perceived observer's interest in face-to-face interactions. The automatic system shows good correlation between the automatic categorization results and the manual ranking made by the observers. In particular, the learning system shows that stress features have a high predictive power for ranking interest of observers when looking at of face-to-face interactions.
|
Xavier Baro, Sergio Escalera, Petia Radeva, & Jordi Vitria. (2009). Generic Object Recognition in Urban Image Databases. In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, pp. 27–34).
Abstract: In this paper we propose the construction of a visual content layer which describes the visual appearance of geographic locations in a city. We captured, by means of a Mobile Mapping system, a huge set of georeferenced images (>500K) which cover the whole city of Barcelona. For each image, hundreds of region descriptions are computed off-line and described as a hash code. All this information is extracted without an object of reference, which allows to search for any type of objects using their visual appearance. A new Visual Content layer is built over Google Maps, allowing the object recognition information to be organized and fused with other content, like satellite images, street maps, and business locations.
|
Francesco Ciompi, Oriol Pujol, Oriol Rodriguez-Leor, Angel Serrano, J. Mauri, & Petia Radeva. (2009). On in-vitro and in-vivo IVUS data fusion. In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, pp. 147–156).
Abstract: The design and the validation of an automatic plaque characterization technique based on Intravascular Ultrasound (IVUS) usually requires a data ground-truth. The histological analysis of post-mortem coronary arteries is commonly assumed as the state-of-the-art process for the extraction of a reliable data-set of atherosclerotic plaques. Unfortunately, the amount of data provided by this technique is usually few, due to the difficulties in collecting post-mortem cases and phenomena of tissue spoiling during histological analysis. In this paper we tackle the process of fusing in-vivo and in-vitro IVUS data starting with the analysis of recently proposed approaches for the creation of an enhanced IVUS data-set; furthermore, we propose a new approach, named pLDS, based on semi-supervised learning with a data selection criterion. The enhanced data-set obtained by each one of the analyzed approaches is used to train a classifier for tissue characterization purposes. Finally, the discriminative power of each classifier is quantitatively assessed and compared by classifying a data-set of validated in-vitro IVUS data.
|
Pierluigi Casale, Oriol Pujol, Petia Radeva, & Jordi Vitria. (2009). A First Approach to Activity Recognition Using Topic Models. In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, pp. 74–82).
Abstract: In this work, we present a first approach to activity patterns discovery by mean of topic models. Using motion data collected with a wearable device we prototype, TheBadge, we analyse raw accelerometer data using Latent Dirichlet Allocation (LDA), a particular instantiation of topic models. Results show that for particular values of the parameters necessary for applying LDA to a countinous dataset, good accuracies in activity classification can be achieved.
|
Arnau Ramisa, Shrihari Vasudevan, David Aldavert, Ricardo Toledo, & Ramon Lopez de Mantaras. (2009). Evaluation of the SIFT Object Recognition Method in Mobile Robots: Frontiers in Artificial Intelligence and Applications. In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, pp. 9–18).
Abstract: General object recognition in mobile robots is of primary importance in order to enhance the representation of the environment that robots will use for their reasoning processes. Therefore, we contribute reduce this gap by evaluating the SIFT Object Recognition method in a challenging dataset, focusing on issues relevant to mobile robotics. Resistance of the method to the robotics working conditions was found, but it was limited mainly to well-textured objects.
|
Eloi Puertas, Sergio Escalera, & Oriol Pujol. (2010). Classifying Objects at Different Sizes with Multi-Scale Stacked Sequential Learning. In J. Aguilar A. M. R. Alquezar (Ed.), 13th International Conference of the Catalan Association for Artificial Intelligence (Vol. 220, 193–200).
Abstract: Sequential learning is that discipline of machine learning that deals with dependent data. In this paper, we use the Multi-scale Stacked Sequential Learning approach (MSSL) to solve the task of pixel-wise classification based on contextual information. The main contribution of this work is a shifting technique applied during the testing phase that makes possible, thanks to template images, to classify objects at different sizes. The results show that the proposed method robustly classifies such objects capturing their spatial relationships.
|
David Roche, Debora Gil, & Jesus Giraldo. (2011). An inference model for analyzing termination conditions of Evolutionary Algorithms. In 14th Congrès Català en Intel·ligencia Artificial (pp. 216–225).
Abstract: In real-world problems, it is mandatory to design a termination condition for Evolutionary Algorithms (EAs) ensuring stabilization close to the unknown optimum. Distribution-based quantities are good candidates as far as suitable parameters are used. A main limitation for application to real-world problems is that such parameters strongly depend on the topology of the objective function, as well as, the EA paradigm used.
We claim that the termination problem would be fully solved if we had a model measuring to what extent a distribution-based quantity asymptotically behaves like the solution accuracy. We present a regression-prediction model that relates any two given quantities and reports if they can be statistically swapped as termination conditions. Our framework is applied to two issues. First, exploring if the parameters involved in the computation of distribution-based quantities influence their asymptotic behavior. Second, to what extent existing distribution-based quantities can be asymptotically exchanged for the accuracy of the EA solution.
Keywords: Evolutionary Computation Convergence, Termination Conditions, Statistical Inference
|
Jorge Bernal, F. Javier Sanchez, & Fernando Vilariño. (2011). Depth of Valleys Accumulation Algorithm for Object Detection. In 14th Congrès Català en Intel·ligencia Artificial (Vol. 1, pp. 71–80).
Abstract: This work aims at detecting in which regions the objects in the image are by using information about the intensity of valleys, which appear to surround ob- jects in images where the source of light is in the line of direction than the camera. We present our depth of valleys accumulation method, which consists of two stages: first, the definition of the depth of valleys image which combines the output of a ridges and valleys detector with the morphological gradient to measure how deep is a point inside a valley and second, an algorithm that denotes points of the image as interior to objects those which are inside complete or incomplete boundaries in the depth of valleys image. To evaluate the performance of our method we have tested it on several application domains. Our results on object region identification are promising, specially in the field of polyp detection in colonoscopy videos, and we also show its applicability in different areas.
Keywords: Object Recognition, Object Region Identification, Image Analysis, Image Processing
|