toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Pichao Wang; Wanqing Li; Philip Ogunbona; Jun Wan; Sergio Escalera edit   pdf
url  openurl
  Title RGB-D-based Human Motion Recognition with Deep Learning: A Survey Type Journal Article
  Year 2018 Publication Computer Vision and Image Understanding Abbreviated Journal CVIU  
  Volume 171 Issue Pages 118-139  
  Keywords Human motion recognition; RGB-D data; Deep learning; Survey  
  Abstract (down) Human motion recognition is one of the most important branches of human-centered research activities. In recent years, motion recognition based on RGB-D data has attracted much attention. Along with the development in artificial intelligence, deep learning techniques have gained remarkable success in computer vision. In particular, convolutional neural networks (CNN) have achieved great success for image-based tasks, and recurrent neural networks (RNN) are renowned for sequence-based problems. Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data. In this paper, a detailed overview of recent advances in RGB-D-based motion recognition is presented. The reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth-based, skeleton-based and RGB+D-based. As a survey focused on the application of deep learning to RGB-D-based motion recognition, we explicitly discuss the advantages and limitations of existing techniques. Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ WLO2018 Serial 3123  
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera edit  url
openurl 
  Title Hand sign language recognition using multi-view hand skeleton Type Journal Article
  Year 2020 Publication Expert Systems With Applications Abbreviated Journal ESWA  
  Volume 150 Issue Pages 113336  
  Keywords Multi-view hand skeleton; Hand sign language recognition; 3DCNN; Hand pose estimation; RGB video; Hand action recognition  
  Abstract (down) Hand sign language recognition from video is a challenging research area in computer vision, which performance is affected by hand occlusion, fast hand movement, illumination changes, or background complexity, just to mention a few. In recent years, deep learning approaches have achieved state-of-the-art results in the field, though previous challenges are not completely solved. In this work, we propose a novel deep learning-based pipeline architecture for efficient automatic hand sign language recognition using Single Shot Detector (SSD), 2D Convolutional Neural Network (2DCNN), 3D Convolutional Neural Network (3DCNN), and Long Short-Term Memory (LSTM) from RGB input videos. We use a CNN-based model which estimates the 3D hand keypoints from 2D input frames. After that, we connect these estimated keypoints to build the hand skeleton by using midpoint algorithm. In order to obtain a more discriminative representation of hands, we project 3D hand skeleton into three views surface images. We further employ the heatmap image of detected keypoints as input for refinement in a stacked fashion. We apply 3DCNNs on the stacked features of hand, including pixel level, multi-view hand skeleton, and heatmap features, to extract discriminant local spatio-temporal features from these stacked inputs. The outputs of the 3DCNNs are fused and fed to a LSTM to model long-term dynamics of hand sign gestures. Analyzing 2DCNN vs. 3DCNN using different number of stacked inputs into the network, we demonstrate that 3DCNN better capture spatio-temporal dynamics of hands. To the best of our knowledge, this is the first time that this multi-modal and multi-view set of hand skeleton features are applied for hand sign language recognition. Furthermore, we present a new large-scale hand sign language dataset, namely RKS-PERSIANSIGN, including 10′000 RGB videos of 100 Persian sign words. Evaluation results of the proposed model on three datasets, NYU, First-Person, and RKS-PERSIANSIGN, indicate that our model outperforms state-of-the-art models in hand sign language recognition, hand pose estimation, and hand action recognition.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA; no proj Approved no  
  Call Number Admin @ si @ RKE2020a Serial 3411  
Permanent link to this record
 

 
Author Reza Azad; Maryam Asadi-Aghbolaghi; Shohreh Kasaei; Sergio Escalera edit  doi
openurl 
  Title Dynamic 3D Hand Gesture Recognition by Learning Weighted Depth Motion Maps Type Journal Article
  Year 2019 Publication IEEE Transactions on Circuits and Systems for Video Technology Abbreviated Journal TCSVT  
  Volume 29 Issue 6 Pages 1729-1740  
  Keywords Hand gesture recognition; Multilevel temporal sampling; Weighted depth motion map; Spatio-temporal description; VLAD encoding  
  Abstract (down) Hand gesture recognition from sequences of depth maps is a challenging computer vision task because of the low inter-class and high intra-class variability, different execution rates of each gesture, and the high articulated nature of human hand. In this paper, a multilevel temporal sampling (MTS) method is first proposed that is based on the motion energy of key-frames of depth sequences. As a result, long, middle, and short sequences are generated that contain the relevant gesture information. The MTS results in increasing the intra-class similarity while raising the inter-class dissimilarities. The weighted depth motion map (WDMM) is then proposed to extract the spatio-temporal information from generated summarized sequences by an accumulated weighted absolute difference of consecutive frames. The histogram of gradient (HOG) and local binary pattern (LBP) are exploited to extract features from WDMM. The obtained results define the current state-of-the-art on three public benchmark datasets of: MSR Gesture 3D, SKIG, and MSR Action 3D, for 3D hand gesture recognition. We also achieve competitive results on NTU action dataset.  
  Address June 2019,  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA; no proj Approved no  
  Call Number Admin @ si @ AAK2018 Serial 3213  
Permanent link to this record
 

 
Author Razieh Rastgoo; Kourosh Kiani; Sergio Escalera edit  url
openurl 
  Title ZS-GR: zero-shot gesture recognition from RGB-D videos Type Journal Article
  Year 2023 Publication Multimedia Tools and Applications Abbreviated Journal MTAP  
  Volume 82 Issue Pages 43781-43796  
  Keywords  
  Abstract (down) Gesture Recognition (GR) is a challenging research area in computer vision. To tackle the annotation bottleneck in GR, we formulate the problem of Zero-Shot Gesture Recognition (ZS-GR) and propose a two-stream model from two input modalities: RGB and Depth videos. To benefit from the vision Transformer capabilities, we use two vision Transformer models, for human detection and visual features representation. We configure a transformer encoder-decoder architecture, as a fast and accurate human detection model, to overcome the challenges of the current human detection models. Considering the human keypoints, the detected human body is segmented into nine parts. A spatio-temporal representation from human body is obtained using a vision Transformer and a LSTM network. A semantic space maps the visual features to the lingual embedding of the class labels via a Bidirectional Encoder Representations from Transformers (BERT) model. We evaluated the proposed model on five datasets, Montalbano II, MSR Daily Activity 3D, CAD-60, NTU-60, and isoGD obtaining state-of-the-art results compared to state-of-the-art ZS-GR models as well as the Zero-Shot Action Recognition (ZS-AR).  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HUPBA Approved no  
  Call Number Admin @ si @ RKE2023a Serial 3879  
Permanent link to this record
 

 
Author Miguel Angel Bautista; Sergio Escalera; Oriol Pujol edit   pdf
doi  openurl
  Title On the Design of an ECOC-Compliant Genetic Algorithm Type Journal Article
  Year 2014 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 47 Issue 2 Pages 865-884  
  Keywords  
  Abstract (down) Genetic Algorithms (GA) have been previously applied to Error-Correcting Output Codes (ECOC) in state-of-the-art works in order to find a suitable coding matrix. Nevertheless, none of the presented techniques directly take into account the properties of the ECOC matrix. As a result the considered search space is unnecessarily large. In this paper, a novel Genetic strategy to optimize the ECOC coding step is presented. This novel strategy redefines the usual crossover and mutation operators in order to take into account the theoretical properties of the ECOC framework. Thus, it reduces the search space and lets the algorithm to converge faster. In addition, a novel operator that is able to enlarge the code in a smart way is introduced. The novel methodology is tested on several UCI datasets and four challenging computer vision problems. Furthermore, the analysis of the results done in terms of performance, code length and number of Support Vectors shows that the optimization process is able to find very efficient codes, in terms of the trade-off between classification performance and the number of classifiers. Finally, classification performance per dichotomizer results shows that the novel proposal is able to obtain similar or even better results while defining a more compact number of dichotomies and SVs compared to state-of-the-art approaches.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes HuPBA;MILAB Approved no  
  Call Number Admin @ si @ BEP2013 Serial 2254  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: