|
Naila Murray. (2012). Predicting Saliency and Aesthetics in Images: A Bottom-up Perspective (Xavier Otazu, & Maria Vanrell, Eds.). Ph.D. thesis, Ediciones Graficas Rey, .
Abstract: In Part 1 of the thesis, we hypothesize that salient and non-salient image regions can be estimated to be the regions which are enhanced or assimilated in standard low-level color image representations. We prove this hypothesis by adapting a low-level model of color perception into a saliency estimation model. This model shares the three main steps found in many successful models for predicting attention in a scene: convolution with a set of filters, a center-surround mechanism and spatial pooling to construct a saliency map. For such models, integrating spatial information and justifying the choice of various parameter values remain open problems. Our saliency model inherits a principled selection of parameters as well as an innate spatial pooling mechanism from the perception model on which it is based. This pooling mechanism has been fitted using psychophysical data acquired in color-luminance setting experiments. The proposed model outperforms the state-of-the-art at the task of predicting eye-fixations from two datasets. After demonstrating the effectiveness of our basic saliency model, we introduce an improved image representation, based on geometrical grouplets, that enhances complex low-level visual features such as corners and terminations, and suppresses relatively simpler features such as edges. With this improved image representation, the performance of our saliency model in predicting eye-fixations increases for both datasets.
In Part 2 of the thesis, we investigate the problem of aesthetic visual analysis. While a great deal of research has been conducted on hand-crafting image descriptors for aesthetics, little attention so far has been dedicated to the collection, annotation and distribution of ground truth data. Because image aesthetics is complex and subjective, existing datasets, which have few images and few annotations, have significant limitations. To address these limitations, we have introduced a new large-scale database for conducting Aesthetic Visual Analysis, which we call AVA. AVA contains more than 250,000 images, along with a rich variety of annotations. We investigate how the wealth of data in AVA can be used to tackle the challenge of understanding and assessing visual aesthetics by looking into several problems relevant for aesthetic analysis. We demonstrate that by leveraging the data in AVA, and using generic low-level features such as SIFT and color histograms, we can exceed state-of-the-art performance in aesthetic quality prediction tasks.
Finally, we entertain the hypothesis that low-level visual information in our saliency model can also be used to predict visual aesthetics by capturing local image characteristics such as feature contrast, grouping and isolation, characteristics thought to be related to universal aesthetic laws. We use the weighted center-surround responses that form the basis of our saliency model to create a feature vector that describes aesthetics. We also introduce a novel color space for fine-grained color representation. We then demonstrate that the resultant features achieve state-of-the-art performance on aesthetic quality classification.
As such, a promising contribution of this thesis is to show that several vision experiences – low-level color perception, visual saliency and visual aesthetics estimation – may be successfully modeled using a unified framework. This suggests a similar architecture in area V1 for both color perception and saliency and adds evidence to the hypothesis that visual aesthetics appreciation is driven in part by low-level cues.
|
|
|
Jordi Gonzalez. (2004). Human Sequence Evaluation: the Key-frame Approach (Xavier Roca, & Javier Varona, Eds.). Ph.D. thesis, , .
|
|
|
Jaume Gibert, Ernest Valveny, & Horst Bunke. (2011). Dimensionality Reduction for Graph of Words Embedding. In Xiaoyi Jiang, Miquel Ferrer, & Andrea Torsello (Eds.), 8th IAPR-TC-15 International Workshop. Graph-Based Representations in Pattern Recognition (Vol. 6658, pp. 22–31). LNCS.
Abstract: The Graph of Words Embedding consists in mapping every graph of a given dataset to a feature vector by counting unary and binary relations between node attributes of the graph. While it shows good properties in classification problems, it suffers from high dimensionality and sparsity. These two issues are addressed in this article. Two well-known techniques for dimensionality reduction, kernel principal component analysis (kPCA) and independent component analysis (ICA), are applied to the embedded graphs. We discuss their performance compared to the classification of the original vectors on three different public databases of graphs.
|
|
|
Aura Hernandez-Sabate, & Debora Gil. (2012). The Benefits of IVUS Dynamics for Retrieving Stable Models of Arteries. In Yasuhiro Honda (Ed.), Intravascular Ultrasound (pp. 185–206). Intech.
|
|
|
Sergio Vera, Miguel Angel Gonzalez Ballester, & Debora Gil. (2012). Optimal Medial Surface Generation for Anatomical Volume Representations. In MichaelW. David and Vannier H. and H. Yoshida (Ed.), Abdominal Imaging. Computational and Clinical Applications (Vol. 7601, pp. 265–273). Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Abstract: Medial representations are a widely used technique in abdominal organ shape representation and parametrization. Those methods require good medial manifolds as a starting point. Any medial
surface used to parametrize a volume should be simple enough to allow an easy manipulation and complete enough to allow an accurate reconstruction of the volume. Obtaining good quality medial
surfaces is still a problem with current iterative thinning methods. This forces the usage of generic, pre-calculated medial templates that are adapted to the final shape at the cost of a drop in volume reconstruction.
This paper describes an operator for generation of medial structures that generates clean and complete manifolds well suited for their further use in medial representations of abdominal organ volumes. While being simpler than thinning surfaces, experiments show its high performance in volume reconstruction and preservation of medial surface main branching topology.
Keywords: Medial surface representation; volume reconstruction
|
|
|
Fadi Dornaika, & Bogdan Raducanu. (2011). Subtle Facial Expression Recognition in Still Images and Videos. In Yu-Jin Zhang (Ed.), Advances in Face Image Analysis: Techniques and Technologies (pp. 259–277). New York, USA: IGI-Global.
Abstract: This chapter addresses the recognition of basic facial expressions. It has three main contributions. First, the authors introduce a view- and texture independent schemes that exploits facial action parameters estimated by an appearance-based 3D face tracker. they represent the learned facial actions associated with different facial expressions by time series. Two dynamic recognition schemes are proposed: (1) the first is based on conditional predictive models and on an analysis-synthesis scheme, and (2) the second is based on examples allowing straightforward use of machine learning approaches. Second, the authors propose an efficient recognition scheme based on the detection of keyframes in videos. Third, the authors compare the dynamic scheme with a static one based on analyzing individual snapshots and show that in general the former performs better than the latter. The authors then provide evaluations of performance using Linear Discriminant Analysis (LDA), Non parametric Discriminant Analysis (NDA), and Support Vector Machines (SVM).
|
|
|
Miquel Ferrer, I. Bardaji, Ernest Valveny, Dimosthenis Karatzas, & Horst Bunke. (2013). Median Graph Computation by Means of Graph Embedding into Vector Spaces. In Yun Fu, & Yungian Ma (Eds.), Graph Embedding for Pattern Analysis (pp. 45–72). Springer New York.
Abstract: In pattern recognition [8, 14], a key issue to be addressed when designing a system is how to represent input patterns. Feature vectors is a common option. That is, a set of numerical features describing relevant properties of the pattern are computed and arranged in a vector form. The main advantages of this kind of representation are computational simplicity and a well sound mathematical foundation. Thus, a large number of operations are available to work with vectors and a large repository of algorithms for pattern analysis and classification exist. However, the simple structure of feature vectors might not be the best option for complex patterns where nonnumerical features or relations between different parts of the pattern become relevant.
|
|
|
Sergio Escalera, Vassilis Athitsos, & Isabelle Guyon. (2016). Challenges in multimodal gesture recognition. JMLR - Journal of Machine Learning Research, 17, 1–54.
Abstract: This paper surveys the state of the art on multimodal gesture recognition and introduces the JMLR special topic on gesture recognition 2011-2015. We began right at the start of the KinectTMrevolution when inexpensive infrared cameras providing image depth recordings became available. We published papers using this technology and other more conventional methods, including regular video cameras, to record data, thus providing a good overview of uses of machine learning and computer vision using multimodal data in this area of application. Notably, we organized a series of challenges and made available several datasets we recorded for that purpose, including tens of thousands
of videos, which are available to conduct further research. We also overview recent state of the art works on gesture recognition based on a proposed taxonomy for gesture recognition, discussing challenges and future lines of research.
Keywords: Gesture Recognition; Time Series Analysis; Multimodal Data Analysis; Computer Vision; Pattern Recognition; Wearable sensors; Infrared Cameras; KinectTM
|
|