David Masip, Alexander Todorov, & Jordi Vitria. (2012). The Role of Facial Regions in Evaluating Social Dime. In Rita Cucchiara V. M. Andrea Fusiello (Ed.), 12th European Conference on Computer Vision – Workshops and Demonstrations (Vol. 7584, pp. 210–219). LNCS. Springer Berlin Heidelberg.
Abstract: Facial trait judgments are an important information cue for people. Recent works in the Psychology field have stated the basis of face evaluation, defining a set of traits that we evaluate from faces (e.g. dominance, trustworthiness, aggressiveness, attractiveness, threatening or intelligence among others). We rapidly infer information from others faces, usually after a short period of time (< 1000ms) we perceive a certain degree of dominance or trustworthiness of another person from the face. Although these perceptions are not necessarily accurate, they influence many important social outcomes (such as the results of the elections or the court decisions). This topic has also attracted the attention of Computer Vision scientists, and recently a computational model to automatically predict trait evaluations from faces has been proposed. These systems try to mimic the human perception by means of applying machine learning classifiers to a set of labeled data. In this paper we perform an experimental study on the specific facial features that trigger the social inferences. Using previous results from the literature, we propose to use simple similarity maps to evaluate which regions of the face influence the most the trait inferences. The correlation analysis is performed using only appearance, and the results from the experiments suggest that each trait is correlated with specific facial characteristics.
Keywords: Workshops and Demonstrations
|
Hamdi Dibeklioglu, Theo Gevers, & Albert Ali Salah. (2012). Are You Really Smiling at Me? Spontaneous versus Posed Enjoyment Smiles. In 12th European Conference on Computer Vision (Vol. 7574, pp. 525–538). LNCS. Springer Berlin Heidelberg.
Abstract: Smiling is an indispensable element of nonverbal social interaction. Besides, automatic distinction between spontaneous and posed expressions is important for visual analysis of social signals. Therefore, in this paper, we propose a method to distinguish between spontaneous and posed enjoyment smiles by using the dynamics of eyelid, cheek, and lip corner movements. The discriminative power of these movements, and the effect of different fusion levels are investigated on multiple databases. Our results improve the state-of-the-art. We also introduce the largest spontaneous/posed enjoyment smile database collected to date, and report new empirical and conceptual findings on smile dynamics. The collected database consists of 1240 samples of 400 subjects. Moreover, it has the unique property of having an age range from 8 to 76 years. Large scale experiments on the new database indicate that eyelid dynamics are highly relevant for smile classification, and there are age-related differences in smile dynamics.
|
Katerine Diaz, Francesc J. Ferri, & W. Diaz. (2013). Fast Approximated Discriminative Common Vectors using rank-one SVD updates. In 20th International Conference On Neural Information Processing (Vol. 8228, pp. 368–375). LNCS. Springer Berlin Heidelberg.
Abstract: An efficient incremental approach to the discriminative common vector (DCV) method for dimensionality reduction and classification is presented. The proposal consists of a rank-one update along with an adaptive restriction on the rank of the null space which leads to an approximate but convenient solution. The algorithm can be implemented very efficiently in terms of matrix operations and space complexity, which enables its use in large-scale dynamic application domains. Deep comparative experimentation using publicly available high dimensional image datasets has been carried out in order to properly assess the proposed algorithm against several recent incremental formulations.
K. Diaz-Chito, F.J. Ferri, W. Diaz
|
Julie Digne, Mariella Dimiccoli, Neus Sabater, & Philippe Salembier. (2015). Neighborhood Filters and the Recovery of 3D Information. In Handbook of Mathematical Methods in Imaging (pp. 1645–1673). Springer New York.
Abstract: Following their success in image processing (see Chapter Local Smoothing Neighborhood Filters), neighborhood filters have been extended to 3D surface processing. This adaptation is not straightforward. It has led to several variants for surfaces depending on whether the surface is defined as a mesh, or as a raw data point set. The image gray level in the bilateral similarity measure is replaced by a geometric information such as the normal or the curvature. The first section of this chapter reviews the variants of 3D mesh bilateral filters and compares them to the simplest possible isotropic filter, the mean curvature motion.In a second part, this chapter reviews applications of the bilateral filter to a data composed of a sparse depth map (or of depth cues) and of the image on which they have been computed. Such sparse depth cues can be obtained by stereovision or by psychophysical techniques. The underlying assumption to these applications is that pixels with similar intensity around a region are likely to have similar depths. Therefore, when diffusing depth information with a bilateral filter based on locality and color similarity, the discontinuities in depth are assured to be consistent with the color discontinuities, which is generally a desirable property. In the reviewed applications, this ends up with the reconstruction of a dense perceptual depth map from the joint data of an image and of depth cues.
|
Angel Sappa, David Geronimo, Fadi Dornaika, & Antonio Lopez. (2006). Real Time Vehicle Pose Using On-Board Stereo Vision System. In International Conference on Image Analysis and Recognition (205–216).
Abstract: This paper presents a robust technique for a real time estimation of both camera’s position and orientation—referred as pose. A commercial stereo vision system is used. Unlike previous approaches, it can be used either for urban or highway scenarios. The proposed technique consists of two stages. Initially, a compact 2D representation of the original 3D data points is computed. Then, a RANSAC based least squares approach is used for fitting a plane to the road. At the same time,
relative camera’s position and orientation are computed. The proposed technique is intended to be used on a driving assistance scheme for applications such as obstacle or pedestrian detection. Experimental results on urban environments with different road geometries are presented.
|
David Sanchez-Mendoza, David Masip, & Agata Lapedriza. (2015). Emotion recognition from mid-level features. PRL - Pattern Recognition Letters, 67(Part 1), 66–74.
Abstract: In this paper we present a study on the use of Action Units as mid-level features for automatically recognizing basic and subtle emotions. We propose a representation model based on mid-level facial muscular movement features. We encode these movements dynamically using the Facial Action Coding System, and propose to use these intermediate features based on Action Units (AUs) to classify emotions. AUs activations are detected fusing a set of spatiotemporal geometric and appearance features. The algorithm is validated in two applications: (i) the recognition of 7 basic emotions using the publicly available Cohn-Kanade database, and (ii) the inference of subtle emotional cues in the Newscast database. In this second scenario, we consider emotions that are perceived cumulatively in longer periods of time. In particular, we Automatically classify whether video shoots from public News TV channels refer to Good or Bad news. To deal with the different video lengths we propose a Histogram of Action Units and compute it using a sliding window strategy on the frame sequences. Our approach achieves accuracies close to human perception.
Keywords: Facial expression; Emotion recognition; Action units; Computer vision
|
Ivo Everts, Jan van Gemert, & Theo Gevers. (2012). Per-patch Descriptor Selection using Surface and Scene Properties. In 12th European Conference on Computer Vision (Vol. 7577, pp. 172–186). LNCS. Springer Berlin Heidelberg.
Abstract: Local image descriptors are generally designed for describing all possible image patches. Such patches may be subject to complex variations in appearance due to incidental object, scene and recording conditions. Because of this, a single-best descriptor for accurate image representation under all conditions does not exist. Therefore, we propose to automatically select from a pool of descriptors the one that is best suitable based on object surface and scene properties. These properties are measured on the fly from a single image patch through a set of attributes. Attributes are input to a classifier which selects the best descriptor. Our experiments on a large dataset of colored object patches show that the proposed selection method outperforms the best single descriptor and a-priori combinations of the descriptor pool.
|
Jose Manuel Alvarez, Theo Gevers, Y. LeCun, & Antonio Lopez. (2012). Road Scene Segmentation from a Single Image. In 12th European Conference on Computer Vision (Vol. 7578, pp. 376–389). LNCS. Springer Berlin Heidelberg.
Abstract: Road scene segmentation is important in computer vision for different applications such as autonomous driving and pedestrian detection. Recovering the 3D structure of road scenes provides relevant contextual information to improve their understanding.
In this paper, we use a convolutional neural network based algorithm to learn features from noisy labels to recover the 3D scene layout of a road image. The novelty of the algorithm relies on generating training labels by applying an algorithm trained on a general image dataset to classify on–board images. Further, we propose a novel texture descriptor based on a learned color plane fusion to obtain maximal uniformity in road areas. Finally, acquired (off–line) and current (on–line) information are combined to detect road areas in single images.
From quantitative and qualitative experiments, conducted on publicly available datasets, it is concluded that convolutional neural networks are suitable for learning 3D scene layout from noisy labels and provides a relative improvement of 7% compared to the baseline. Furthermore, combining color planes provides a statistical description of road areas that exhibits maximal uniformity and provides a relative improvement of 8% compared to the baseline. Finally, the improvement is even bigger when acquired and current information from a single image are combined
Keywords: road detection
|