David Masip, Agata Lapedriza, & Jordi Vitria. (2009). Boosted Online Learning for Face Recognition. TSMCB - IEEE Transactions on Systems, Man and Cybernetics part B, 39(2), 530–538.
Abstract: Face recognition applications commonly suffer from three main drawbacks: a reduced training set, information lying in high-dimensional subspaces, and the need to incorporate new people to recognize. In the recent literature, the extension of a face classifier in order to include new people in the model has been solved using online feature extraction techniques. The most successful approaches of those are the extensions of the principal component analysis or the linear discriminant analysis. In the current paper, a new online boosting algorithm is introduced: a face recognition method that extends a boosting-based classifier by adding new classes while avoiding the need of retraining the classifier each time a new person joins the system. The classifier is learned using the multitask learning principle where multiple verification tasks are trained together sharing the same feature space. The new classes are added taking advantage of the structure learned previously, being the addition of new classes not computationally demanding. The present proposal has been (experimentally) validated with two different facial data sets by comparing our approach with the current state-of-the-art techniques. The results show that the proposed online boosting algorithm fares better in terms of final accuracy. In addition, the global performance does not decrease drastically even when the number of classes of the base problem is multiplied by eight.
|
Arnau Ramisa, Shrihari Vasudevan, David Aldavert, Ricardo Toledo, & Ramon Lopez de Mantaras. (2009). Evaluation of the SIFT Object Recognition Method in Mobile Robots: Frontiers in Artificial Intelligence and Applications. In 12th International Conference of the Catalan Association for Artificial Intelligence (Vol. 202, pp. 9–18).
Abstract: General object recognition in mobile robots is of primary importance in order to enhance the representation of the environment that robots will use for their reasoning processes. Therefore, we contribute reduce this gap by evaluating the SIFT Object Recognition method in a challenging dataset, focusing on issues relevant to mobile robotics. Resistance of the method to the robotics working conditions was found, but it was limited mainly to well-textured objects.
|
Fahad Shahbaz Khan, Joost Van de Weijer, & Maria Vanrell. (2009). Top-Down Color Attention for Object Recognition. In 12th International Conference on Computer Vision (pp. 979–986).
Abstract: Generally the bag-of-words based image representation follows a bottom-up paradigm. The subsequent stages of the process: feature detection, feature description, vocabulary construction and image representation are performed independent of the intentioned object classes to be detected. In such a framework, combining multiple cues such as shape and color often provides below-expected results. This paper presents a novel method for recognizing object categories when using multiple cues by separating the shape and color cue. Color is used to guide attention by means of a top-down category-specific attention map. The color attention map is then further deployed to modulate the shape features by taking more features from regions within an image that are likely to contain an object instance. This procedure leads to a category-specific image histogram representation for each category. Furthermore, we argue that the method combines the advantages of both early and late fusion. We compare our approach with existing methods that combine color and shape cues on three data sets containing varied importance of both cues, namely, Soccer ( color predominance), Flower (color and shape parity), and PASCAL VOC Challenge 2007 (shape predominance). The experiments clearly demonstrate that in all three data sets our proposed framework significantly outperforms the state-of-the-art methods for combining color and shape information.
|
Miquel Ferrer, Ernest Valveny, & F. Serratosa. (2009). Median Graph Computation by means of a Genetic Approach Based on Minimum Common Supergraph and Maximum Common Subraph. In 4th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 5524, 346–353). LNCS. Springer Berlin Heidelberg.
Abstract: Given a set of graphs, the median graph has been theoretically presented as a useful concept to infer a representative of the set. However, the computation of the median graph is a highly complex task and its practical application has been very limited up to now. In this work we present a new genetic algorithm for the median graph computation. A set of experiments on real data, where none of the existing algorithms for the median graph computation could be applied up to now due to their computational complexity, show that we obtain good approximations of the median graph. Finally, we use the median graph in a real nearest neighbour classification showing that it leaves the box of the only-theoretical concepts and demonstrating, from a practical point of view, that can be a useful tool to represent a set of graphs.
|
Miquel Ferrer, Ernest Valveny, & F. Serratosa. (2009). Median Graphs: A Genetic Approach based on New Theoretical Properties. PR - Pattern Recognition, 42(9), 2003–2012.
Abstract: Given a set of graphs, the median graph has been theoretically presented as a useful concept to infer a representative of the set. However, the computation of the median graph is a highly complex task and its practical application has been very limited up to now. In this work we present two major contributions. On one side, and from a theoretical point of view, we show new theoretical properties of the median graph. On the other side, using these new properties, we present a new approximate algorithm based on the genetic search, that improves the computation of the median graph. Finally, we perform a set of experiments on real data, where none of the existing algorithms for the median graph computation could be applied up to now due to their computational complexity. With these results, we show how the concept of the median graph can be used in real applications and leaves the box of the only-theoretical concepts, demonstrating, from a practical point of view, that can be a useful tool to represent a set of graphs.
Keywords: Median graph; Genetic search; Maximum common subgraph; Graph matching; Structural pattern recognition
|
Jose Manuel Alvarez, Ferran Diego, Joan Serrat, & Antonio Lopez. (2009). Automatic Ground-truthing using video registration for on-board detection algorithms. In 16th IEEE International Conference on Image Processing (pp. 4389–4392).
Abstract: Ground-truth data is essential for the objective evaluation of object detection methods in computer vision. Many works claim their method is robust but they support it with experiments which are not quantitatively assessed with regard some ground-truth. This is one of the main obstacles to properly evaluate and compare such methods. One of the main reasons is that creating an extensive and representative ground-truth is very time consuming, specially in the case of video sequences, where thousands of frames have to be labelled. Could such a ground-truth be generated, at least in part, automatically? Though it may seem a contradictory question, we show that this is possible for the case of video sequences recorded from a moving camera. The key idea is transferring existing frame segmentations from a reference sequence into another video sequence recorded at a different time on the same track, possibly under a different ambient lighting. We have carried out experiments on several video sequence pairs and quantitatively assessed the precision of the transformed ground-truth, which prove that our approach is not only feasible but also quite accurate.
|
Jose Antonio Rodriguez. (2009). Statistical frameworks and prior information modeling in handwritten word-spotting (Gemma Sanchez, Josep Llados, & Florent Perronnin, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Handwritten word-spotting (HWS) is the pattern analysis task that consists in finding keywords in handwritten document images. So far, HWS has been applied mostly to historical documents in order to build search engines for such image collections. This thesis addresses the problem of word-spotting for detecting important keywords in business documents. This is a first step towards the process of automatic routing of correspondence based on content.
However, the application of traditional HWS techniques fails for this type of documents. As opposed to historical documents, real business documents present a very high variability in terms of writing styles, spontaneous writing, crossed-out words, spelling mistakes, etc. The main goal of this thesis is the development of pattern recognition techniques that lead to a high-performance HWS system for this challenging type of data.
We develop a statistical framework in which word models are expressed in terms of hidden Markov models and the a priori information is encoded in a universal vocabulary of Gaussian codewords. This systems leads to a very robust performance in word-spotting task. We also find that by constraining the word models to the universal vocabulary, the a priori information of the problem of interest can be exploited for developing new contributions. These include a novel writer adaptation method, a system for searching handwritten words by generating typed text images, and a novel model-based similarity between feature vector sequences.
|
Jose Antonio Rodriguez, & Florent Perronnin. (2009). Handwritten word-spotting using hidden Markov models and universal vocabularies. PR - Pattern Recognition, 42(9), 2103–2116.
Abstract: Handwritten word-spotting is traditionally viewed as an image matching task between one or multiple query word-images and a set of candidate word-images in a database. This is a typical instance of the query-by-example paradigm. In this article, we introduce a statistical framework for the word-spotting problem which employs hidden Markov models (HMMs) to model keywords and a Gaussian mixture model (GMM) for score normalization. We explore the use of two types of HMMs for the word modeling part: continuous HMMs (C-HMMs) and semi-continuous HMMs (SC-HMMs), i.e. HMMs with a shared set of Gaussians. We show on a challenging multi-writer corpus that the proposed statistical framework is always superior to a traditional matching system which uses dynamic time warping (DTW) for word-image distance computation. A very important finding is that the SC-HMM is superior when labeled training data is scarce—as low as one sample per keyword—thanks to the prior information which can be incorporated in the shared set of Gaussians.
Keywords: Word-spotting; Hidden Markov model; Score normalization; Universal vocabulary; Handwriting recognition
|
Gemma Roig, Xavier Boix, & Fernando De la Torre. (2009). Optimal Feature Selection for Subspace Image Matching. In 2nd IEEE International Workshop on Subspace Methods in conjunction.
Abstract: Image matching has been a central research topic in computer vision over the last decades. Typical approaches to correspondence involve matching feature points between images. In this paper, we present a novel problem for establishing correspondences between a sparse set of image features and a previously learned subspace model. We formulate the matching task as an energy minimization, and jointly optimize over all possible feature assignments and parameters of the subspace model. This problem is in general NP-hard. We propose a convex relaxation approximation, and develop two optimization strategies: naïve gradient-descent and quadratic programming. Alternatively, we reformulate the optimization criterion as a sparse eigenvalue problem, and solve it using a recently proposed backward greedy algorithm. Experimental results on facial feature detection show that the quadratic programming solution provides better selection mechanism for relevant features.
|
Robert Benavente, C. Alejandro Parraga, & Maria Vanrell. (2009). Colour categories boundaries are better defined in contextual conditions. PER - Perception, 38, 36.
Abstract: In a previous experiment [Parraga et al, 2009 Journal of Imaging Science and Technology 53(3)] the boundaries between basic colour categories were measured by asking subjects to categorize colour samples presented in isolation (ie on a dark background) using a YES/NO paradigm. Results showed that some boundaries (eg green – blue) were very diffuse and the subjects' answers presented bimodal distributions, which were attributed to the emergence of non-basic categories in those regions (eg turquoise). To confirm these results we performed a new experiment focussed on the boundaries where bimodal distributions were more evident. In this new experiment rectangular colour samples were presented surrounded by random colour patches to simulate contextual conditions on a calibrated CRT monitor. The names of two neighbouring colours were shown at the bottom of the screen and subjects selected the boundary between these colours by controlling the chromaticity of the central patch, sliding it across these categories' frontier. Results show that in this new experimental paradigm, the formerly uncertain inter-colour category boundaries are better defined and the dispersions (ie the bimodal distributions) that occurred in the previous experiment disappear. These results may provide further support to Berlin and Kay's basic colour terms theory.
|
S. Chanda, Umapada Pal, & Oriol Ramos Terrades. (2009). Word-Wise Thai and Roman Script Identification. TALIP - ACM Transactions on Asian Language Information Processing, 1–21.
Abstract: In some Thai documents, a single text line of a printed document page may contain words of both Thai and Roman scripts. For the Optical Character Recognition (OCR) of such a document page it is better to identify, at first, Thai and Roman script portions and then to use individual OCR systems of the respective scripts on these identified portions. In this article, an SVM-based method is proposed for identification of word-wise printed Roman and Thai scripts from a single line of a document page. Here, at first, the document is segmented into lines and then lines are segmented into character groups (words). In the proposed scheme, we identify the script of a character group combining different character features obtained from structural shape, profile behavior, component overlapping information, topological properties, and water reservoir concept, etc. Based on the experiment on 10,000 data (words) we obtained 99.62% script identification accuracy from the proposed scheme.
|
Murad Al Haj, Andrew Bagdanov, Jordi Gonzalez, & Xavier Roca. (2009). Robust and Efficient Multipose Face Detection Using Skin Color Segmentation. In 4th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 5524). LNCS. Springer Berlin Heidelberg.
Abstract: In this paper we describe an efficient technique for detecting faces in arbitrary images and video sequences. The approach is based on segmentation of images or video frames into skin-colored blobs using a pixel-based heuristic. Scale and translation invariant features are then computed from these segmented blobs which are used to perform statistical discrimination between face and non-face classes. We train and evaluate our method on a standard, publicly available database of face images and analyze its performance over a range of statistical pattern classifiers. The generalization of our approach is illustrated by testing on an independent sequence of frames containing many faces and non-faces. These experiments indicate that our proposed approach obtains false positive rates comparable to more complex, state-of-the-art techniques, and that it generalizes better to new data. Furthermore, the use of skin blobs and invariant features requires fewer training samples since significantly fewer non-face candidate regions must be considered when compared to AdaBoost-based approaches.
|
Salim Jouili, Salvatore Tabbone, & Ernest Valveny. (2009). Comparing Graph Similarity Measures for Graphical Recognition. In 8th IAPR International Workshop on Graphics Recognition. LNCS. Springer.
Abstract: In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used including line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each graph distance measure depends on the kind of data and the graph representation technique.
|
Salim Jouili, Salvatore Tabbone, & Ernest Valveny. (2009). Evaluation of graph matching measures for documents retrieval. In In proceedings of 8th IAPR International Workshop on Graphics Recognition (13–21).
Abstract: In this paper we evaluate four graph distance measures. The analysis is performed for document retrieval tasks. For this aim, different kind of documents are used which include line drawings (symbols), ancient documents (ornamental letters), shapes and trademark-logos. The experimental results show that the performance of each grahp distance measure depends on the kind of data and the graph representation technique.
Keywords: Graph Matching; Graph retrieval; structural representation; Performance Evaluation
|
Marçal Rusiñol, & Josep Llados. (2009). Logo Spotting by a Bag-of-words Approach for Document Categorization. In 10th International Conference on Document Analysis and Recognition (111–115).
Abstract: In this paper we present a method for document categorization which processes incoming document images such as invoices or receipts. The categorization of these document images is done in terms of the presence of a certain graphical logo detected without segmentation. The graphical logos are described by a set of local features and the categorization of the documents is performed by the use of a bag-of-words model. Spatial coherence rules are added to reinforce the correct category hypothesis, aiming also to spot the logo inside the document image. Experiments which demonstrate the effectiveness of this system on a large set of real data are presented.
|