2017 |
|
Pau Rodriguez, Guillem Cucurull, Jordi Gonzalez, Josep M. Gonfaus, Kamal Nasrollahi, Thomas B. Moeslund, et al. (2017). Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification. Cyber - IEEE Transactions on cybernetics, , 1–11.
Abstract: Pain is an unpleasant feeling that has been shown to be an important factor for the recovery of patients. Since this is costly in human resources and difficult to do objectively, there is the need for automatic systems to measure it. In this paper, contrary to current state-of-the-art techniques in pain assessment, which are based on facial features only, we suggest that the performance can be enhanced by feeding the raw frames to deep learning models, outperforming the latest state-of-the-art results while also directly facing the problem of imbalanced data. As a baseline, our approach first uses convolutional neural networks (CNNs) to learn facial features from VGG_Faces, which are then linked to a long short-term memory to exploit the temporal relation between video frames. We further compare the performances of using the so popular schema based on the canonically normalized appearance versus taking into account the whole image. As a result, we outperform current state-of-the-art area under the curve performance in the UNBC-McMaster Shoulder Pain Expression Archive Database. In addition, to evaluate the generalization properties of our proposed methodology on facial motion recognition, we also report competitive results in the Cohn Kanade+ facial expression database.
|
|
|
Pau Rodriguez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca, & Jordi Gonzalez. (2017). Age and gender recognition in the wild with deep attention. PR - Pattern Recognition, 72, 563–571.
Abstract: Face analysis in images in the wild still pose a challenge for automatic age and gender recognition tasks, mainly due to their high variability in resolution, deformation, and occlusion. Although the performance has highly increased thanks to Convolutional Neural Networks (CNNs), it is still far from optimal when compared to other image recognition tasks, mainly because of the high sensitiveness of CNNs to facial variations. In this paper, inspired by biology and the recent success of attention mechanisms on visual question answering and fine-grained recognition, we propose a novel feedforward attention mechanism that is able to discover the most informative and reliable parts of a given face for improving age and gender classification. In particular, given a downsampled facial image, the proposed model is trained based on a novel end-to-end learning framework to extract the most discriminative patches from the original high-resolution image. Experimental validation on the standard Adience, Images of Groups, and MORPH II benchmarks show that including attention mechanisms enhances the performance of CNNs in terms of robustness and accuracy.
Keywords: Age recognition; Gender recognition; Deep neural networks; Attention mechanisms
|
|
2016 |
|
Anastasios Doulamis, Nikolaos Doulamis, Marco Bertini, Jordi Gonzalez, & Thomas B. Moeslund. (2016). Introduction to the Special Issue on the Analysis and Retrieval of Events/Actions and Workflows in Video Streams. MTAP - Multimedia Tools and Applications, 75(22), 14985–14990.
|
|
|
Mikkel Thogersen, Sergio Escalera, Jordi Gonzalez, & Thomas B. Moeslund. (2016). Segmentation of RGB-D Indoor scenes by Stacking Random Forests and Conditional Random Fields. PRL - Pattern Recognition Letters, 80, 208–215.
Abstract: This paper proposes a technique for RGB-D scene segmentation using Multi-class
Multi-scale Stacked Sequential Learning (MMSSL) paradigm. Following recent trends in state-of-the-art, a base classifier uses an initial SLIC segmentation to obtain superpixels which provide a diminution of data while retaining object boundaries. A series of color and depth features are extracted from the superpixels, and are used in a Conditional Random Field (CRF) to predict superpixel labels. Furthermore, a Random Forest (RF) classifier using random offset features is also used as an input to the CRF, acting as an initial prediction. As a stacked classifier, another Random Forest is used acting on a spatial multi-scale decomposition of the CRF confidence map to correct the erroneous labels assigned by the previous classifier. The model is tested on the popular NYU-v2 dataset.
The approach shows that simple multi-modal features with the power of the MMSSL
paradigm can achieve better performance than state of the art results on the same dataset.
|
|
|
Sergio Escalera, Jordi Gonzalez, Xavier Baro, & Jamie Shotton. (2016). Guest Editor Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1489–1491.
Abstract: The sixteen papers in this special section focus on human pose recovery and behavior analysis (HuPBA). This is one of the most challenging topics in computer vision, pattern analysis, and machine learning. It is of critical importance for application areas that include gaming, computer interaction, human robot interaction, security, commerce, assistive technologies and rehabilitation, sports, sign language recognition, and driver assistance technology, to mention just a few. In essence, HuPBA requires dealing with the articulated nature of the human body, changes in appearance due to clothing, and the inherent problems of clutter scenes, such as background artifacts, occlusions, and illumination changes. These papers represent the most recent research in this field, including new methods considering still images, image sequences, depth data, stereo vision, 3D vision, audio, and IMUs, among others.
|
|