|
Anjan Dutta, Josep Llados, Horst Bunke, & Umapada Pal. (2013). A Product graph based method for dual subgraph matching applied to symbol spotting. In 10th IAPR International Workshop on Graphics Recognition.
Abstract: Product graph has been shown to be an efficient way for matching subgraphs. This paper reports the extension of the product graph methodology for subgraph matching applied to symbol spotting in graphical documents. This paper focuses on the two major limitations of the previous version of product graph: (1) Spurious nodes and edges in the graph representation and (2) Inefficient node and edge attributes. To deal with noisy information of vectorized graphical documents, we consider a dual graph representation on the original graph representing the graphical information and the product graph is computed between the dual graphs of the query graphs and the input graph.
The dual graph with redundant edges is helpful for efficient and tolerating encoding of the structural information of the graphical documents. The adjacency matrix of the product graph locates similar path information of two graphs and exponentiating the adjacency matrix finds similar paths of greater lengths. Nodes joining similar paths between two graphs are found by combining different exponentials of adjacency matrices. An experimental investigation reveals that the recall obtained by this approach is quite encouraging.
|
|
|
L. Rothacker, Marçal Rusiñol, & G.A. Fink. (2013). Bag-of-Features HMMs for segmentation-free word spotting in handwritten documents. In 12th International Conference on Document Analysis and Recognition (pp. 1305–1309).
Abstract: Recent HMM-based approaches to handwritten word spotting require large amounts of learning samples and mostly rely on a prior segmentation of the document. We propose to use Bag-of-Features HMMs in a patch-based segmentation-free framework that are estimated by a single sample. Bag-of-Features HMMs use statistics of local image feature representatives. Therefore they can be considered as a variant of discrete HMMs allowing to model the observation of a number of features at a point in time. The discrete nature enables us to estimate a query model with only a single example of the query provided by the user. This makes our method very flexible with respect to the availability of training data. Furthermore, we are able to outperform state-of-the-art results on the George Washington dataset.
|
|
|
Muhammad Anwer Rao. (2013). Color for Object Detection and Action Recognition (Antonio Lopez, & Joost Van de Weijer, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: Recognizing object categories in real world images is a challenging problem in computer vision. The deformable part based framework is currently the most successful approach for object detection. Generally, HOG are used for image representation within the part-based framework. For action recognition, the bag-of-word framework has shown to provide promising results. Within the bag-of-words framework, local image patches are described by SIFT descriptor. Contrary to object detection and action recognition, combining color and shape has shown to provide the best performance for object and scene recognition.
In the first part of this thesis, we analyze the problem of person detection in still images. Standard person detection approaches rely on intensity based features for image representation while ignoring the color. Channel based descriptors is one of the most commonly used approaches in object recognition. This inspires us to evaluate incorporating color information using the channel based fusion approach for the task of person detection.
In the second part of the thesis, we investigate the problem of object detection in still images. Due to high dimensionality, channel based fusion increases the computational cost. Moreover, channel based fusion has been found to obtain inferior results for object category where one of the visual varies significantly. On the other hand, late fusion is known to provide improved results for a wide range of object categories. A consequence of late fusion strategy is the need of a pure color descriptor. Therefore, we propose to use Color attributes as an explicit color representation for object detection. Color attributes are compact and computationally efficient. Consequently color attributes are combined with traditional shape features providing excellent results for object detection task.
Finally, we focus on the problem of action detection and classification in still images. We investigate the potential of color for action classification and detection in still images. We also evaluate different fusion approaches for combining color and shape information for action recognition. Additionally, an analysis is performed to validate the contribution of color for action recognition. Our results clearly demonstrate that combining color and shape information significantly improve the performance of both action classification and detection in still images.
|
|
|
Adriana Romero, & Carlo Gatta. (2013). Do We Really Need All These Neurons? In 6th Iberian Conference on Pattern Recognition and Image Analysis (Vol. 7887, pp. 460–467). LNCS. Springer Berlin Heidelberg.
Abstract: Restricted Boltzmann Machines (RBMs) are generative neural networks that have received much attention recently. In particular, choosing the appropriate number of hidden units is important as it might hinder their representative power. According to the literature, RBM require numerous hidden units to approximate any distribution properly. In this paper, we present an experiment to determine whether such amount of hidden units is required in a classification context. We then propose an incremental algorithm that trains RBM reusing the previously trained parameters using a trade-off measure to determine the appropriate number of hidden units. Results on the MNIST and OCR letters databases show that using a number of hidden units, which is one order of magnitude smaller than the literature estimate, suffices to achieve similar performance. Moreover, the proposed algorithm allows to estimate the required number of hidden units without the need of training many RBM from scratch.
Keywords: Retricted Boltzmann Machine; hidden units; unsupervised learning; classification
|
|
|
German Ros, J. Guerrero, Angel Sappa, Daniel Ponsa, & Antonio Lopez. (2013). Fast and Robust l1-averaging-based Pose Estimation for Driving Scenarios. In 24th British Machine Vision Conference.
Abstract: Robust visual pose estimation is at the core of many computer vision applications, being fundamental for Visual SLAM and Visual Odometry problems. During the last decades, many approaches have been proposed to solve these problems, being RANSAC one of the most accepted and used. However, with the arrival of new challenges, such as large driving scenarios for autonomous vehicles, along with the improvements in the data gathering frameworks, new issues must be considered. One of these issues is the capability of a technique to deal with very large amounts of data while meeting the realtime
constraint. With this purpose in mind, we present a novel technique for the problem of robust camera-pose estimation that is more suitable for dealing with large amount of data, which additionally, helps improving the results. The method is based on a combination of a very fast coarse-evaluation function and a robust ℓ1-averaging procedure. Such scheme leads to high-quality results while taking considerably less time than RANSAC.
Experimental results on the challenging KITTI Vision Benchmark Suite are provided, showing the validity of the proposed approach.
Keywords: SLAM
|
|
|
Shida Beigpour, Marc Serra, Joost Van de Weijer, Robert Benavente, Maria Vanrell, Olivier Penacchio, et al. (2013). Intrinsic Image Evaluation On Synthetic Complex Scenes. In 20th IEEE International Conference on Image Processing (pp. 285–289).
Abstract: Scene decomposition into its illuminant, shading, and reflectance intrinsic images is an essential step for scene understanding. Collecting intrinsic image groundtruth data is a laborious task. The assumptions on which the ground-truth
procedures are based limit their application to simple scenes with a single object taken in the absence of indirect lighting and interreflections. We investigate synthetic data for intrinsic image research since the extraction of ground truth is straightforward, and it allows for scenes in more realistic situations (e.g, multiple illuminants and interreflections). With this dataset we aim to motivate researchers to further explore intrinsic image decomposition in complex scenes.
|
|
|
Lluis Gomez, & Dimosthenis Karatzas. (2013). Multi-script Text Extraction from Natural Scenes. In 12th International Conference on Document Analysis and Recognition (pp. 467–471).
Abstract: Scene text extraction methodologies are usually based in classification of individual regions or patches, using a priori knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organisation through which text emerges as a perceptually significant group of atomic objects. Therefore humans are able to detect text even in languages and scripts never seen before. In this paper, we argue that the text extraction problem could be posed as the detection of meaningful groups of regions. We present a method built around a perceptual organisation framework that exploits collaboration of proximity and similarity laws to create text-group hypotheses. Experiments demonstrate that our algorithm is competitive with state of the art approaches on a standard dataset covering text in variable orientations and two languages.
|
|
|
Santiago Segui, Michal Drozdzal, Ekaterina Zaytseva, Carolina Malagelada, Fernando Azpiroz, Petia Radeva, et al. (2013). A new image centrality descriptor for wrinkle frame detection in WCE videos. In 13th IAPR Conference on Machine Vision Applications.
Abstract: Small bowel motility dysfunctions are a widespread functional disorder characterized by abdominal pain and altered bowel habits in the absence of specific and unique organic pathology. Current methods of diagnosis are complex and can only be conducted at some highly specialized referral centers. Wireless Video Capsule Endoscopy (WCE) could be an interesting diagnostic alternative that presents excellent clinical advantages, since it is non-invasive and can be conducted by non specialists. The purpose of this work is to present a new method for the detection of wrinkle frames in WCE, a critical characteristic to detect one of the main motility events: contractions. The method goes beyond the use of one of the classical image feature, the Histogram
|
|
|
Jiaolong Xu, David Vazquez, Antonio Lopez, Javier Marin, & Daniel Ponsa. (2013). Learning a Multiview Part-based Model in Virtual World for Pedestrian Detection. In IEEE Intelligent Vehicles Symposium (pp. 467–472). IEEE.
Abstract: State-of-the-art deformable part-based models based on latent SVM have shown excellent results on human detection. In this paper, we propose to train a multiview deformable part-based model with automatically generated part examples from virtual-world data. The method is efficient as: (i) the part detectors are trained with precisely extracted virtual examples, thus no latent learning is needed, (ii) the multiview pedestrian detector enhances the performance of the pedestrian root model, (iii) a top-down approach is used for part detection which reduces the searching space. We evaluate our model on Daimler and Karlsruhe Pedestrian Benchmarks with publicly available Caltech pedestrian detection evaluation framework and the result outperforms the state-of-the-art latent SVM V4.0, on both average miss rate and speed (our detector is ten times faster).
Keywords: Pedestrian Detection; Virtual World; Part based
|
|
|
Fahad Shahbaz Khan, Joost Van de Weijer, Sadiq Ali, & Michael Felsberg. (2013). Evaluating the impact of color on texture recognition. In 15th International Conference on Computer Analysis of Images and Patterns (Vol. 8047, pp. 154–162). Springer Berlin Heidelberg.
Abstract: State-of-the-art texture descriptors typically operate on grey scale images while ignoring color information. A common way to obtain a joint color-texture representation is to combine the two visual cues at the pixel level. However, such an approach provides sub-optimal results for texture categorisation task.
In this paper we investigate how to optimally exploit color information for texture recognition. We evaluate a variety of color descriptors, popular in image classification, for texture categorisation. In addition we analyze different fusion approaches to combine color and texture cues. Experiments are conducted on the challenging scenes and 10 class texture datasets. Our experiments clearly suggest that in all cases color names provide the best performance. Late fusion is the best strategy to combine color and texture. By selecting the best color descriptor with optimal fusion strategy provides a gain of 5% to 8% compared to texture alone on scenes and texture datasets.
Keywords: Color; Texture; image representation
|
|
|
Muhammad Muzzamil Luqman, Jean-Yves Ramel, Josep Llados, & Thierry Brouard. (2013). Fuzzy Multilevel Graph Embedding. PR - Pattern Recognition, 46(2), 551–565.
Abstract: Structural pattern recognition approaches offer the most expressive, convenient, powerful but computational expensive representations of underlying relational information. To benefit from mature, less expensive and efficient state-of-the-art machine learning models of statistical pattern recognition they must be mapped to a low-dimensional vector space. Our method of explicit graph embedding bridges the gap between structural and statistical pattern recognition. We extract the topological, structural and attribute information from a graph and encode numeric details by fuzzy histograms and symbolic details by crisp histograms. The histograms are concatenated to achieve a simple and straightforward embedding of graph into a low-dimensional numeric feature vector. Experimentation on standard public graph datasets shows that our method outperforms the state-of-the-art methods of graph embedding for richly attributed graphs.
Keywords: Pattern recognition; Graphics recognition; Graph clustering; Graph classification; Explicit graph embedding; Fuzzy logic
|
|
|
David Roche, Debora Gil, & Jesus Giraldo. (2013). Detecting loss of diversity for an efficient termination of EAs. In 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (pp. 561–566).
Abstract: Termination of Evolutionary Algorithms (EA) at its steady state so that useless iterations are not performed is a main point for its efficient application to black-box problems. Many EA algorithms evolve while there is still diversity in their population and, thus, they could be terminated by analyzing the behavior some measures of EA population diversity. This paper presents a numeric approximation to steady states that can be used to detect the moment EA population has lost its diversity for EA termination. Our condition has been applied to 3 EA paradigms based on diversity and a selection of functions
covering the properties most relevant for EA convergence.
Experiments show that our condition works regardless of the search space dimension and function landscape.
Keywords: EA termination; EA population diversity; EA steady state
|
|
|
M. Visani, V.C.Kieu, Alicia Fornes, & N.Journet. (2013). The ICDAR 2013 Music Scores Competition: Staff Removal. In 12th International Conference on Document Analysis and Recognition (pp. 1439–1443).
Abstract: The first competition on music scores that was organized at ICDAR in 2011 awoke the interest of researchers, who participated both at staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario: old music scores. For this purpose, we have generated a new set of images using two kinds of degradations: local noise and 3D distortions. This paper describes the dataset, distortion methods, evaluation metrics, the participant's methods and the obtained results.
|
|
|
Sergio Escalera. (2013). Multi-Modal Human Behaviour Analysis from Visual Data Sources. ERCIM - ERCIM News journal, 21–22.
Abstract: The Human Pose Recovery and Behaviour Analysis group (HuPBA), University of Barcelona, is developing a line of research on multi-modal analysis of humans in visual data. The novel technology is being applied in several scenarios with high social impact, including sign language recognition, assisted technology and supported diagnosis for the elderly and people with mental/physical disabilities, fitness conditioning, and Human Computer Interaction.
|
|
|
Carles Fernandez, Jordi Gonzalez, Joao Manuel R. S. Taveres, & Xavier Roca. (2013). Towards Ontological Cognitive System. In Topics in Medical Image Processing and Computational Vision (Vol. 8, pp. 87–99). Springer Netherlands.
Abstract: The increasing ubiquitousness of digital information in our daily lives has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. This raises a series of technological demands for automatic video understanding and management, which together with the compromising attentional limitations of human operators, have motivated the research community to guide its steps towards a better attainment of such capabilities. As a result, current trends on cognitive vision promise to recognize complex events and self-adapt to different environments, while managing and integrating several types of knowledge. Future directions suggest to reinforce the multi-modal fusion of information sources and the communication with end-users.
|
|