|
Records |
Links |
|
Author |
Albert Clapes; Miguel Reyes; Sergio Escalera |


|
|
Title |
Multi-modal User Identification and Object Recognition Surveillance System |
Type |
Journal Article |
|
Year |
2013 |
Publication |
Pattern Recognition Letters |
Abbreviated Journal |
PRL |
|
|
Volume |
34 |
Issue |
7 |
Pages |
799-808 |
|
|
Keywords  |
Multi-modal RGB-Depth data analysis; User identification; Object recognition; Intelligent surveillance; Visual features; Statistical learning |
|
|
Abstract |
We propose an automatic surveillance system for user identification and object recognition based on multi-modal RGB-Depth data analysis. We model a RGBD environment learning a pixel-based background Gaussian distribution. Then, user and object candidate regions are detected and recognized using robust statistical approaches. The system robustly recognizes users and updates the system in an online way, identifying and detecting new actors in the scene. Moreover, segmented objects are described, matched, recognized, and updated online using view-point 3D descriptions, being robust to partial occlusions and local 3D viewpoint rotations. Finally, the system saves the historic of user–object assignments, being specially useful for surveillance scenarios. The system has been evaluated on a novel data set containing different indoor/outdoor scenarios, objects, and users, showing accurate recognition and better performance than standard state-of-the-art approaches. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Elsevier |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; 600.046; 605.203;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ CRE2013 |
Serial |
2248 |
|
Permanent link to this record |
|
|
|
|
Author |
Antonio Hernandez; Nadezhda Zlateva; Alexander Marinov; Miguel Reyes; Petia Radeva; Dimo Dimov; Sergio Escalera |


|
|
Title |
Human Limb Segmentation in Depth Maps based on Spatio-Temporal Graph Cuts Optimization |
Type |
Journal Article |
|
Year |
2012 |
Publication |
Journal of Ambient Intelligence and Smart Environments |
Abbreviated Journal |
JAISE |
|
|
Volume |
4 |
Issue |
6 |
Pages |
535-546 |
|
|
Keywords  |
Multi-modal vision processing; Random Forest; Graph-cuts; multi-label segmentation; human body segmentation |
|
|
Abstract |
We present a framework for object segmentation using depth maps based on Random Forest and Graph-cuts theory, and apply it to the segmentation of human limbs. First, from a set of random depth features, Random Forest is used to infer a set of label probabilities for each data sample. This vector of probabilities is used as unary term in α−β swap Graph-cuts algorithm. Moreover, depth values of spatio-temporal neighboring data points are used as boundary potentials. Results on a new multi-label human depth data set show high performance in terms of segmentation overlapping of the novel methodology compared to classical approaches. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1876-1364 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
MILAB;HuPBA |
Approved |
no |
|
|
Call Number |
Admin @ si @ HZM2012a |
Serial |
2006 |
|
Permanent link to this record |
|
|
|
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera |

|
|
Title |
Hand sign language recognition using multi-view hand skeleton |
Type |
Journal Article |
|
Year |
2020 |
Publication |
Expert Systems With Applications |
Abbreviated Journal |
ESWA |
|
|
Volume |
150 |
Issue |
|
Pages |
113336 |
|
|
Keywords  |
Multi-view hand skeleton; Hand sign language recognition; 3DCNN; Hand pose estimation; RGB video; Hand action recognition |
|
|
Abstract |
Hand sign language recognition from video is a challenging research area in computer vision, which performance is affected by hand occlusion, fast hand movement, illumination changes, or background complexity, just to mention a few. In recent years, deep learning approaches have achieved state-of-the-art results in the field, though previous challenges are not completely solved. In this work, we propose a novel deep learning-based pipeline architecture for efficient automatic hand sign language recognition using Single Shot Detector (SSD), 2D Convolutional Neural Network (2DCNN), 3D Convolutional Neural Network (3DCNN), and Long Short-Term Memory (LSTM) from RGB input videos. We use a CNN-based model which estimates the 3D hand keypoints from 2D input frames. After that, we connect these estimated keypoints to build the hand skeleton by using midpoint algorithm. In order to obtain a more discriminative representation of hands, we project 3D hand skeleton into three views surface images. We further employ the heatmap image of detected keypoints as input for refinement in a stacked fashion. We apply 3DCNNs on the stacked features of hand, including pixel level, multi-view hand skeleton, and heatmap features, to extract discriminant local spatio-temporal features from these stacked inputs. The outputs of the 3DCNNs are fused and fed to a LSTM to model long-term dynamics of hand sign gestures. Analyzing 2DCNN vs. 3DCNN using different number of stacked inputs into the network, we demonstrate that 3DCNN better capture spatio-temporal dynamics of hands. To the best of our knowledge, this is the first time that this multi-modal and multi-view set of hand skeleton features are applied for hand sign language recognition. Furthermore, we present a new large-scale hand sign language dataset, namely RKS-PERSIANSIGN, including 10′000 RGB videos of 100 Persian sign words. Evaluation results of the proposed model on three datasets, NYU, First-Person, and RKS-PERSIANSIGN, indicate that our model outperforms state-of-the-art models in hand sign language recognition, hand pose estimation, and hand action recognition. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HuPBA; no proj;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ RKE2020a |
Serial |
3411 |
|
Permanent link to this record |
|
|
|
|
Author |
Mohammad Ali Bagheri; Qigang Gao; Sergio Escalera |

|
|
Title |
Combining Local and Global Learners in the Pairwise Multiclass Classification |
Type |
Journal Article |
|
Year |
2015 |
Publication |
Pattern Analysis and Applications |
Abbreviated Journal |
PAA |
|
|
Volume |
18 |
Issue |
4 |
Pages |
845-860 |
|
|
Keywords  |
Multiclass classification; Pairwise approach; One-versus-one |
|
|
Abstract |
Pairwise classification is a well-known class binarization technique that converts a multiclass problem into a number of two-class problems, one problem for each pair of classes. However, in the pairwise technique, nuisance votes of many irrelevant classifiers may result in a wrong class prediction. To overcome this problem, a simple, but efficient method is proposed and evaluated in this paper. The proposed method is based on excluding some classes and focusing on the most probable classes in the neighborhood space, named Local Crossing Off (LCO). This procedure is performed by employing a modified version of standard K-nearest neighbor and large margin nearest neighbor algorithms. The LCO method takes advantage of nearest neighbor classification algorithm because of its local learning behavior as well as the global behavior of powerful binary classifiers to discriminate between two classes. Combining these two properties in the proposed LCO technique will avoid the weaknesses of each method and will increase the efficiency of the whole classification system. On several benchmark datasets of varying size and difficulty, we found that the LCO approach leads to significant improvements using different base learners. The experimental results show that the proposed technique not only achieves better classification accuracy in comparison to other standard approaches, but also is computationally more efficient for tackling classification problems which have a relatively large number of target classes. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
Springer London |
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
1433-7541 |
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HuPBA;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ BGE2014 |
Serial |
2441 |
|
Permanent link to this record |
|
|
|
|
Author |
Albert Clapes; Alex Pardo; Oriol Pujol; Sergio Escalera |


|
|
Title |
Action detection fusing multiple Kinects and a WIMU: an application to in-home assistive technology for the elderly |
Type |
Journal Article |
|
Year |
2018 |
Publication |
Machine Vision and Applications |
Abbreviated Journal |
MVAP |
|
|
Volume |
29 |
Issue |
5 |
Pages |
765–788 |
|
|
Keywords  |
Multimodal activity detection; Computer vision; Inertial sensors; Dense trajectories; Dynamic time warping; Assistive technology |
|
|
Abstract |
We present a vision-inertial system which combines two RGB-Depth devices together with a wearable inertial movement unit in order to detect activities of the daily living. From multi-view videos, we extract dense trajectories enriched with a histogram of normals description computed from the depth cue and bag them into multi-view codebooks. During the later classification step a multi-class support vector machine with a RBF- 2 kernel combines the descriptions at kernel level. In order to perform action detection from the videos, a sliding window approach is utilized. On the other hand, we extract accelerations, rotation angles, and jerk features from the inertial data collected by the wearable placed on the user’s dominant wrist. During gesture spotting, a dynamic time warping is applied and the aligning costs to a set of pre-selected gesture sub-classes are thresholded to determine possible detections. The outputs of the two modules are combined in a late-fusion fashion. The system is validated in a real-case scenario with elderly from an elder home. Learning-based fusion results improve the ones from the single modalities, demonstrating the success of such multimodal approach. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no proj;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ CPP2018 |
Serial |
3125 |
|
Permanent link to this record |