|
Xavier Baro, Sergio Escalera, Jordi Vitria, Oriol Pujol, & Petia Radeva. (2009). Traffic Sign Recognition Using Evolutionary Adaboost Detection and Forest-ECOC Classification. TITS - IEEE Transactions on Intelligent Transportation Systems, 10(1), 113–126.
Abstract: The high variability of sign appearance in uncontrolled environments has made the detection and classification of road signs a challenging problem in computer vision. In this paper, we introduce a novel approach for the detection and classification of traffic signs. Detection is based on a boosted detectors cascade, trained with a novel evolutionary version of Adaboost, which allows the use of large feature spaces. Classification is defined as a multiclass categorization problem. A battery of classifiers is trained to split classes in an Error-Correcting Output Code (ECOC) framework. We propose an ECOC design through a forest of optimal tree structures that are embedded in the ECOC matrix. The novel system offers high performance and better accuracy than the state-of-the-art strategies and is potentially better in terms of noise, affine deformation, partial occlusions, and reduced illumination.
|
|
|
Sergio Escalera. (2013). Multi-Modal Human Behaviour Analysis from Visual Data Sources. ERCIM - ERCIM News journal, 21–22.
Abstract: The Human Pose Recovery and Behaviour Analysis group (HuPBA), University of Barcelona, is developing a line of research on multi-modal analysis of humans in visual data. The novel technology is being applied in several scenarios with high social impact, including sign language recognition, assisted technology and supported diagnosis for the elderly and people with mental/physical disabilities, fitness conditioning, and Human Computer Interaction.
|
|
|
Alvaro Cepero, Albert Clapes, & Sergio Escalera. (2015). Automatic non-verbal communication skills analysis: a quantitative evaluation. AIC - AI Communications, 28(1), 87–101.
Abstract: The oral communication competence is defined on the top of the most relevant skills for one's professional and personal life. Because of the importance of communication in our activities of daily living, it is crucial to study methods to evaluate and provide the necessary feedback that can be used in order to improve these communication capabilities and, therefore, learn how to express ourselves better. In this work, we propose a system capable of evaluating quantitatively the quality of oral presentations in an automatic fashion. The system is based on a multi-modal RGB, depth, and audio data description and a fusion approach in order to recognize behavioral cues and train classifiers able to eventually predict communication quality levels. The performance of the proposed system is tested on a novel dataset containing Bachelor thesis' real defenses, presentations from an 8th semester Bachelor courses, and Master courses' presentations at Universitat de Barcelona. Using as groundtruth the marks assigned by actual instructors, our system achieves high performance categorizing and ranking presentations by their quality, and also making real-valued mark predictions.
Keywords: Social signal processing; human behavior analysis; multi-modal data description; multi-modal data fusion; non-verbal communication analysis; e-Learning
|
|
|
Sergio Escalera, Jordi Gonzalez, Xavier Baro, & Jamie Shotton. (2016). Guest Editor Introduction to the Special Issue on Multimodal Human Pose Recovery and Behavior Analysis. TPAMI - IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1489–1491.
Abstract: The sixteen papers in this special section focus on human pose recovery and behavior analysis (HuPBA). This is one of the most challenging topics in computer vision, pattern analysis, and machine learning. It is of critical importance for application areas that include gaming, computer interaction, human robot interaction, security, commerce, assistive technologies and rehabilitation, sports, sign language recognition, and driver assistance technology, to mention just a few. In essence, HuPBA requires dealing with the articulated nature of the human body, changes in appearance due to clothing, and the inherent problems of clutter scenes, such as background artifacts, occlusions, and illumination changes. These papers represent the most recent research in this field, including new methods considering still images, image sequences, depth data, stereo vision, 3D vision, audio, and IMUs, among others.
|
|
|
Oscar Lopes, Miguel Reyes, Sergio Escalera, & Jordi Gonzalez. (2014). Spherical Blurred Shape Model for 3-D Object and Pose Recognition: Quantitative Analysis and HCI Applications in Smart Environments. TSMCB - IEEE Transactions on Systems, Man and Cybernetics (Part B), 44(12), 2379–2390.
Abstract: The use of depth maps is of increasing interest after the advent of cheap multisensor devices based on structured light, such as Kinect. In this context, there is a strong need of powerful 3-D shape descriptors able to generate rich object representations. Although several 3-D descriptors have been already proposed in the literature, the research of discriminative and computationally efficient descriptors is still an open issue. In this paper, we propose a novel point cloud descriptor called spherical blurred shape model (SBSM) that successfully encodes the structure density and local variabilities of an object based on shape voxel distances and a neighborhood propagation strategy. The proposed SBSM is proven to be rotation and scale invariant, robust to noise and occlusions, highly discriminative for multiple categories of complex objects like the human hand, and computationally efficient since the SBSM complexity is linear to the number of object voxels. Experimental evaluation in public depth multiclass object data, 3-D facial expressions data, and a novel hand poses data sets show significant performance improvements in relation to state-of-the-art approaches. Moreover, the effectiveness of the proposal is also proved for object spotting in 3-D scenes and for real-time automatic hand pose recognition in human computer interaction scenarios.
|
|