|
Records |
Links |
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera |
|
|
Title |
Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine |
Type |
Journal Article |
|
Year |
2018 |
Publication |
Entropy |
Abbreviated Journal |
ENTROPY |
|
|
Volume |
20 |
Issue |
11 |
Pages |
809 |
|
|
Keywords |
hand sign language; deep learning; restricted Boltzmann machine (RBM); multi-modal; profoundly deaf; noisy image |
|
|
Abstract |
In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey’s Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no proj;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ RKE2018 |
Serial |
3198 |
|
Permanent link to this record |
|
|
|
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera |
|
|
Title |
Hand sign language recognition using multi-view hand skeleton |
Type |
Journal Article |
|
Year |
2020 |
Publication |
Expert Systems With Applications |
Abbreviated Journal |
ESWA |
|
|
Volume |
150 |
Issue |
|
Pages |
113336 |
|
|
Keywords |
Multi-view hand skeleton; Hand sign language recognition; 3DCNN; Hand pose estimation; RGB video; Hand action recognition |
|
|
Abstract |
Hand sign language recognition from video is a challenging research area in computer vision, which performance is affected by hand occlusion, fast hand movement, illumination changes, or background complexity, just to mention a few. In recent years, deep learning approaches have achieved state-of-the-art results in the field, though previous challenges are not completely solved. In this work, we propose a novel deep learning-based pipeline architecture for efficient automatic hand sign language recognition using Single Shot Detector (SSD), 2D Convolutional Neural Network (2DCNN), 3D Convolutional Neural Network (3DCNN), and Long Short-Term Memory (LSTM) from RGB input videos. We use a CNN-based model which estimates the 3D hand keypoints from 2D input frames. After that, we connect these estimated keypoints to build the hand skeleton by using midpoint algorithm. In order to obtain a more discriminative representation of hands, we project 3D hand skeleton into three views surface images. We further employ the heatmap image of detected keypoints as input for refinement in a stacked fashion. We apply 3DCNNs on the stacked features of hand, including pixel level, multi-view hand skeleton, and heatmap features, to extract discriminant local spatio-temporal features from these stacked inputs. The outputs of the 3DCNNs are fused and fed to a LSTM to model long-term dynamics of hand sign gestures. Analyzing 2DCNN vs. 3DCNN using different number of stacked inputs into the network, we demonstrate that 3DCNN better capture spatio-temporal dynamics of hands. To the best of our knowledge, this is the first time that this multi-modal and multi-view set of hand skeleton features are applied for hand sign language recognition. Furthermore, we present a new large-scale hand sign language dataset, namely RKS-PERSIANSIGN, including 10′000 RGB videos of 100 Persian sign words. Evaluation results of the proposed model on three datasets, NYU, First-Person, and RKS-PERSIANSIGN, indicate that our model outperforms state-of-the-art models in hand sign language recognition, hand pose estimation, and hand action recognition. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HuPBA; no proj;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ RKE2020a |
Serial |
3411 |
|
Permanent link to this record |
|
|
|
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera |
|
|
Title |
Video-based Isolated Hand Sign Language Recognition Using a Deep Cascaded Model |
Type |
Journal Article |
|
Year |
2020 |
Publication |
Multimedia Tools and Applications |
Abbreviated Journal |
MTAP |
|
|
Volume |
79 |
Issue |
|
Pages |
22965–22987 |
|
|
Keywords |
|
|
|
Abstract |
In this paper, we propose an efficient cascaded model for sign language recognition taking benefit from spatio-temporal hand-based information using deep learning approaches, especially Single Shot Detector (SSD), Convolutional Neural Network (CNN), and Long Short Term Memory (LSTM), from videos. Our simple yet efficient and accurate model includes two main parts: hand detection and sign recognition. Three types of spatial features, including hand features, Extra Spatial Hand Relation (ESHR) features, and Hand Pose (HP) features, have been fused in the model to feed to LSTM for temporal features extraction. We train SSD model for hand detection using some videos collected from five online sign dictionaries. Our model is evaluated on our proposed dataset (Rastgoo et al., Expert Syst Appl 150: 113336, 2020), including 10’000 sign videos for 100 Persian sign using 10 contributors in 10 different backgrounds, and isoGD dataset. Using the 5-fold cross-validation method, our model outperforms state-of-the-art alternatives in sign language recognition |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HuPBA; no menciona;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ RKE2020b |
Serial |
3442 |
|
Permanent link to this record |
|
|
|
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera |
|
|
Title |
Sign Language Recognition: A Deep Survey |
Type |
Journal Article |
|
Year |
2021 |
Publication |
Expert Systems With Applications |
Abbreviated Journal |
ESWA |
|
|
Volume |
164 |
Issue |
|
Pages |
113794 |
|
|
Keywords |
|
|
|
Abstract |
Sign language, as a different form of the communication language, is important to large groups of people in society. There are different signs in each sign language with variability in hand shape, motion profile, and position of the hand, face, and body parts contributing to each sign. So, visual sign language recognition is a complex research area in computer vision. Many models have been proposed by different researchers with significant improvement by deep learning approaches in recent years. In this survey, we review the vision-based proposed models of sign language recognition using deep learning approaches from the last five years. While the overall trend of the proposed models indicates a significant improvement in recognition accuracy in sign language recognition, there are some challenges yet that need to be solved. We present a taxonomy to categorize the proposed models for isolated and continuous sign language recognition, discussing applications, datasets, hybrid models, complexity, and future lines of research in the field. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no proj;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ RKE2021a |
Serial |
3521 |
|
Permanent link to this record |
|
|
|
|
Author |
Razieh Rastgoo; Kourosh Kiani; Sergio Escalera |
|
|
Title |
Hand pose aware multimodal isolated sign language recognition |
Type |
Journal Article |
|
Year |
2020 |
Publication |
Multimedia Tools and Applications |
Abbreviated Journal |
MTAP |
|
|
Volume |
80 |
Issue |
|
Pages |
127–163 |
|
|
Keywords |
|
|
|
Abstract |
Isolated hand sign language recognition from video is a challenging research area in computer vision. Some of the most important challenges in this area include dealing with hand occlusion, fast hand movement, illumination changes, or background complexity. While most of the state-of-the-art results in the field have been achieved using deep learning-based models, the previous challenges are not completely solved. In this paper, we propose a hand pose aware model for isolated hand sign language recognition using deep learning approaches from two input modalities, RGB and depth videos. Four spatial feature types: pixel-level, flow, deep hand, and hand pose features, fused from both visual modalities, are input to LSTM for temporal sign recognition. While we use Optical Flow (OF) for flow information in RGB video inputs, Scene Flow (SF) is used for depth video inputs. By including hand pose features, we show a consistent performance improvement of the sign language recognition model. To the best of our knowledge, this is the first time that this discriminant spatiotemporal features, benefiting from the hand pose estimation features and multi-modal inputs, are fused for isolated hand sign language recognition. We perform a step-by-step analysis of the impact in terms of recognition performance of the hand pose features, different combinations of the spatial features, and different recurrent models, especially LSTM and GRU. Results on four public datasets confirm that the proposed model outperforms the current state-of-the-art models on Montalbano II, MSR Daily Activity 3D, and CAD-60 datasets with a relative accuracy improvement of 1.64%, 6.5%, and 7.6%. Furthermore, our model obtains a competitive results on isoGD dataset with only 0.22% margin lower than the current state-of-the-art model. |
|
|
Address |
|
|
|
Corporate Author |
|
Thesis |
|
|
|
Publisher |
|
Place of Publication |
|
Editor |
|
|
|
Language |
|
Summary Language |
|
Original Title |
|
|
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
|
|
Series Volume |
|
Series Issue |
|
Edition |
|
|
|
ISSN |
|
ISBN |
|
Medium |
|
|
|
Area |
|
Expedition |
|
Conference |
|
|
|
Notes |
HUPBA; no menciona;MILAB |
Approved |
no |
|
|
Call Number |
Admin @ si @ RKE2020 |
Serial |
3524 |
|
Permanent link to this record |