|
Miguel Angel Bautista, Antonio Hernandez, Sergio Escalera, Laura Igual, Oriol Pujol, Josep Moya, et al. (2016). A Gesture Recognition System for Detecting Behavioral Patterns of ADHD. TSMCB - IEEE Transactions on System, Man and Cybernetics, Part B, 46(1), 136–147.
Abstract: We present an application of gesture recognition using an extension of Dynamic Time Warping (DTW) to recognize behavioural patterns of Attention Deficit Hyperactivity Disorder (ADHD). We propose an extension of DTW using one-class classifiers in order to be able to encode the variability of a gesture category, and thus, perform an alignment between a gesture sample and a gesture class. We model the set of gesture samples of a certain gesture category using either GMMs or an approximation of Convex Hulls. Thus, we add a theoretical contribution to classical warping path in DTW by including local modeling of intra-class gesture variability. This methodology is applied in a clinical context, detecting a group of ADHD behavioural patterns defined by experts in psychology/psychiatry, to provide support to clinicians in the diagnose procedure. The proposed methodology is tested on a novel multi-modal dataset (RGB plus Depth) of ADHD children recordings with behavioural patterns. We obtain satisfying results when compared to standard state-of-the-art approaches in the DTW context.
Keywords: Gesture Recognition; ADHD; Gaussian Mixture Models; Convex Hulls; Dynamic Time Warping; Multi-modal RGB-Depth data
|
|
|
Sergio Escalera, & Petia Radeva. (2004). Fast greyscale road sign model matching and recognition.
|
|
|
Mehdi Mirza-Mohammadi, Sergio Escalera, & Petia Radeva. (2009). Contextual-Guided Bag-of-Visual-Words Model for Multi-class Object Categorization. In 13th International Conference on Computer Analysis of Images and Patterns (Vol. 5702, 748–756). LNCS. Springer Berlin Heidelberg.
Abstract: Bag-of-words model (BOW) is inspired by the text classification problem, where a document is represented by an unsorted set of contained words. Analogously, in the object categorization problem, an image is represented by an unsorted set of discrete visual words (BOVW). In these models, relations among visual words are performed after dictionary construction. However, close object regions can have far descriptions in the feature space, being grouped as different visual words. In this paper, we present a method for considering geometrical information of visual words in the dictionary construction step. Object interest regions are obtained by means of the Harris-Affine detector and then described using the SIFT descriptor. Afterward, a contextual-space and a feature-space are defined, and a merging process is used to fuse feature words based on their proximity in the contextual-space. Moreover, we use the Error Correcting Output Codes framework to learn the new dictionary in order to perform multi-class classification. Results show significant classification improvements when spatial information is taken into account in the dictionary construction step.
|
|
|
Maria Salamo, Sergio Escalera, & Petia Radeva. (2009). Quality Enhancement based on Reinforcement Learning and Feature Weighting for a Critiquing-Based Recommender. In 8th International Conference on Case-Based Reasoning (Vol. 5650, 298–312). LNCS. Springer Berlin Heidelberg.
Abstract: Personalizing the product recommendation task is a major focus of research in the area of conversational recommender systems. Conversational case-based recommender systems help users to navigate through product spaces, alternatively making product suggestions and eliciting users feedback. Critiquing is a common form of feedback and incremental critiquing-based recommender system has shown its efficiency to personalize products based primarily on a quality measure. This quality measure influences the recommendation process and it is obtained by the combination of compatibility and similarity scores. In this paper, we describe new compatibility strategies whose basis is on reinforcement learning and a new feature weighting technique which is based on the user’s history of critiques. Moreover, we show that our methodology can significantly improve recommendation efficiency in comparison with the state-of-the-art approaches.
|
|
|
Sergio Escalera, Jordi Gonzalez, Xavier Baro, & Jamie Shotton. (2016). Guest Editor Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1489–1491.
Abstract: The sixteen papers in this special section focus on human pose recovery and behavior analysis (HuPBA). This is one of the most challenging topics in computer vision, pattern analysis, and machine learning. It is of critical importance for application areas that include gaming, computer interaction, human robot interaction, security, commerce, assistive technologies and rehabilitation, sports, sign language recognition, and driver assistance technology, to mention just a few. In essence, HuPBA requires dealing with the articulated nature of the human body, changes in appearance due to clothing, and the inherent problems of clutter scenes, such as background artifacts, occlusions, and illumination changes. These papers represent the most recent research in this field, including new methods considering still images, image sequences, depth data, stereo vision, 3D vision, audio, and IMUs, among others.
|
|
|
Fernando Alonso, Xavier Baro, Sergio Escalera, Jordi Gonzalez, Martha Mackay, & Anna Serrahima. (2016). CARE RESPITE: TAKING CARE OF THE CAREGIVERS, Theme 5 The Strategic use of Mobile and Digital Health and Care Solutions. In 16th International Conference for Integrated Care.
|
|
|
Mikkel Thogersen, Sergio Escalera, Jordi Gonzalez, & Thomas B. Moeslund. (2016). Segmentation of RGB-D Indoor scenes by Stacking Random Forests and Conditional Random Fields. PRL - Pattern Recognition Letters, 80, 208–215.
Abstract: This paper proposes a technique for RGB-D scene segmentation using Multi-class
Multi-scale Stacked Sequential Learning (MMSSL) paradigm. Following recent trends in state-of-the-art, a base classifier uses an initial SLIC segmentation to obtain superpixels which provide a diminution of data while retaining object boundaries. A series of color and depth features are extracted from the superpixels, and are used in a Conditional Random Field (CRF) to predict superpixel labels. Furthermore, a Random Forest (RF) classifier using random offset features is also used as an input to the CRF, acting as an initial prediction. As a stacked classifier, another Random Forest is used acting on a spatial multi-scale decomposition of the CRF confidence map to correct the erroneous labels assigned by the previous classifier. The model is tested on the popular NYU-v2 dataset.
The approach shows that simple multi-modal features with the power of the MMSSL
paradigm can achieve better performance than state of the art results on the same dataset.
|
|
|
Meysam Madadi, Sergio Escalera, Jordi Gonzalez, Xavier Roca, & Felipe Lumbreras. (2015). Multi-part body segmentation based on depth maps for soft biometry analysis. PRL - Pattern Recognition Letters, 56, 14–21.
Abstract: This paper presents a novel method extracting biometric measures using depth sensors. Given a multi-part labeled training data, a new subject is aligned to the best model of the dataset, and soft biometrics such as lengths or circumference sizes of limbs and body are computed. The process is performed by training relevant pose clusters, defining a representative model, and fitting a 3D shape context descriptor within an iterative matching procedure. We show robust measures by applying orthogonal plates to body hull. We test our approach in a novel full-body RGB-Depth data set, showing accurate estimation of soft biometrics and better segmentation accuracy in comparison with random forest approach without requiring large training data.
Keywords: 3D shape context; 3D point cloud alignment; Depth maps; Human body segmentation; Soft biometry analysis
|
|
|
Meysam Madadi, Sergio Escalera, Alex Carruesco, Carlos Andujar, Xavier Baro, & Jordi Gonzalez. (2017). Occlusion Aware Hand Pose Recovery from Sequences of Depth Images. In 12th IEEE International Conference on Automatic Face and Gesture Recognition.
Abstract: State-of-the-art approaches on hand pose estimation from depth images have reported promising results under quite controlled considerations. In this paper we propose a two-step pipeline for recovering the hand pose from a sequence of depth images. The pipeline has been designed to deal with images taken from any viewpoint and exhibiting a high degree of finger occlusion. In a first step we initialize the hand pose using a part-based model, fitting a set of hand components in the depth images. In a second step we consider temporal data and estimate the parameters of a trained bilinear model consisting of shape and trajectory bases. Results on a synthetic, highly-occluded dataset demonstrate that the proposed method outperforms most recent pose recovering approaches, including those based on CNNs.
|
|
|
Sergio Escalera, Jordi Gonzalez, Hugo Jair Escalante, Xavier Baro, & Isabelle Guyon. (2018). Looking at People Special Issue. IJCV - International Journal of Computer Vision, 126(2-4), 141–143.
|
|
|
Egils Avots, Meysam Madadi, Sergio Escalera, Jordi Gonzalez, Xavier Baro, Paul Pallin, et al. (2019). From 2D to 3D geodesic-based garment matching. MTAP - Multimedia Tools and Applications, 78(18), 25829–25853.
Abstract: A new approach for 2D to 3D garment retexturing is proposed based on Gaussian mixture models and thin plate splines (TPS). An automatically segmented garment of an individual is matched to a new source garment and rendered, resulting in augmented images in which the target garment has been retextured using the texture of the source garment. We divide the problem into garment boundary matching based on Gaussian mixture models and then interpolate inner points using surface topology extracted through geodesic paths, which leads to a more realistic result than standard approaches. We evaluated and compared our system quantitatively by root mean square error (RMS) and qualitatively using the mean opinion score (MOS), showing the benefits of the proposed methodology on our gathered dataset.
Keywords: Shape matching; Geodesic distance; Texture mapping; RGBD image processing; Gaussian mixture model
|
|
|
Marco Bellantonio, Mohammad A. Haque, Pau Rodriguez, Kamal Nasrollahi, Taisi Telve, Sergio Escalera, et al. (2016). Spatio-Temporal Pain Recognition in CNN-based Super-Resolved Facial Images. In 23rd International Conference on Pattern Recognition (Vol. 10165). LNCS.
Abstract: Automatic pain detection is a long expected solution to a prevalent medical problem of pain management. This is more relevant when the subject of pain is young children or patients with limited ability to communicate about their pain experience. Computer vision-based analysis of facial pain expression provides a way of efficient pain detection. When deep machine learning methods came into the scene, automatic pain detection exhibited even better performance. In this paper, we figured out three important factors to exploit in automatic pain detection: spatial information available regarding to pain in each of the facial video frames, temporal axis information regarding to pain expression pattern in a subject video sequence, and variation of face resolution. We employed a combination of convolutional neural network and recurrent neural network to setup a deep hybrid pain detection framework that is able to exploit both spatial and temporal pain information from facial video. In order to analyze the effect of different facial resolutions, we introduce a super-resolution algorithm to generate facial video frames with different resolution setups. We investigated the performance on the publicly available UNBC-McMaster Shoulder Pain database. As a contribution, the paper provides novel and important information regarding to the performance of a hybrid deep learning framework for pain detection in facial images of different resolution.
|
|
|
Umut Guclu, Yagmur Gucluturk, Meysam Madadi, Sergio Escalera, Xavier Baro, Jordi Gonzalez, et al. (2017). End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks.
Abstract: arXiv:1703.03305
Recent years have seen a sharp increase in the number of related yet distinct advances in semantic segmentation. Here, we tackle this problem by leveraging the respective strengths of these advances. That is, we formulate a conditional random field over a four-connected graph as end-to-end trainable convolutional and recurrent networks, and estimate them via an adversarial process. Importantly, our model learns not only unary potentials but also pairwise
potentials, while aggregating multi-scale contexts and controlling higher-order inconsistencies.
We evaluate our model on two standard benchmark datasets for semantic face segmentation, achieving state-of-the-art results on both of them.
|
|
|
Meysam Madadi, Sergio Escalera, Xavier Baro, & Jordi Gonzalez. (2022). End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth data. IETCV - IET Computer Vision, 16(1), 50–66.
Abstract: Despite recent advances in 3D pose estimation of human hands, especially thanks to the advent of CNNs and depth cameras, this task is still far from being solved. This is mainly due to the highly non-linear dynamics of fingers, which make hand model training a challenging task. In this paper, we exploit a novel hierarchical tree-like structured CNN, in which branches are trained to become specialized in predefined subsets of hand joints, called local poses. We further fuse local pose features, extracted from hierarchical CNN branches, to learn higher order dependencies among joints in the final pose by end-to-end training. Lastly, the loss function used is also defined to incorporate appearance and physical constraints about doable hand motion and deformation. Finally, we introduce a non-rigid data augmentation approach to increase the amount of training depth data. Experimental results suggest that feeding a tree-shaped CNN, specialized in local poses, into a fusion network for modeling joints correlations and dependencies, helps to increase the precision of final estimations, outperforming state-of-the-art results on NYU and SyntheticHand datasets.
Keywords: Computer vision; data acquisition; human computer interaction; learning (artificial intelligence); pose estimation
|
|
|
Martha Mackay, Fernando Alonso, Pere Salamero, Xavier Baro, Jordi Gonzalez, & Sergio Escalera. (2015). Care and caring: future proofing the new demographics. In 6th International Carers Conference.
Abstract: With an ageing population, the issue of care provision is becoming increasingly important. The simple aspiration of the majority of older people is to live safely and well at home. Housing will be part of health & care integration in the following years and decades. A higher proportion of people will have to rely on informal care through family, friends, neighbors and others who
provide care to an older person in need of assistance (around 80% of care across the EU). They do not usually have a formal status and are usually unpaid. We need to ensure that all disabled or chronically ill people can get the help they need without overburdening their families.
The physical and emotional stress of carers is one of the dangers that this dependency can bring. To prevent carers burnout it is necessary to provide new solutions that are affordable and user friendly for the families and caregivers.
|
|