Publicacions CVC -- Edit Record

	Publicacions CVC Home \| Show All \| Simple Search \| Advanced Search \| Add Record \| Import	You must login to submit this form! Login Quick Search: Field: contains: ...
	Edit the following record:

Author	...				is Editor
Title	...			Type
Year	...	Publication	...	Abbreviated Journal	...
Volume	...	Issue	...	Pages	...
Keywords	...
Abstract	Hand sign language recognition from video is a challenging research area in computer vision, which performance is affected by hand occlusion, fast hand movement, illumination changes, or background complexity, just to mention a few. In recent years, deep learning approaches have achieved state-of-the-art results in the field, though previous challenges are not completely solved. In this work, we propose a novel deep learning-based pipeline architecture for efficient automatic hand sign language recognition using Single Shot Detector (SSD), 2D Convolutional Neural Network (2DCNN), 3D Convolutional Neural Network (3DCNN), and Long Short-Term Memory (LSTM) from RGB input videos. We use a CNN-based model which estimates the 3D hand keypoints from 2D input frames. After that, we connect these estimated keypoints to build the hand skeleton by using midpoint algorithm. In order to obtain a more discriminative representation of hands, we project 3D hand skeleton into three views surface images. We further employ the heatmap image of detected keypoints as input for refinement in a stacked fashion. We apply 3DCNNs on the stacked features of hand, including pixel level, multi-view hand skeleton, and heatmap features, to extract discriminant local spatio-temporal features from these stacked inputs. The outputs of the 3DCNNs are fused and fed to a LSTM to model long-term dynamics of hand sign gestures. Analyzing 2DCNN vs. 3DCNN using different number of stacked inputs into the network, we demonstrate that 3DCNN better capture spatio-temporal dynamics of hands. To the best of our knowledge, this is the first time that this multi-modal and multi-view set of hand skeleton features are applied for hand sign language recognition. Furthermore, we present a new large-scale hand sign language dataset, namely RKS-PERSIANSIGN, including 10′000 RGB videos of 100 Persian sign words. Evaluation results of the proposed model on three datasets, NYU, First-Person, and RKS-PERSIANSIGN, indicate that our model outperforms state-of-the-art models in hand sign language recognition, hand pose estimation, and hand action recognition.
Address	...
Corporate Author	...			Thesis
Publisher	...	Place of Publication	...	Editor	...
Language	...	Summary Language	...	Original Title	...
Series Editor	...	Series Title	...	Abbreviated Series Title	...
Series Volume	...	Series Issue	...	Edition	...
ISSN	...	ISBN	...	Medium	...
Area	...	Expedition	...	Conference	...
Notes	...			Approved	yes no
Location
Call Number	...			Serial
Marked	yes no	Copy		Selected	yes no
User Keys	...
User Notes	...			User File	...
User Groups	...			Cite Key	...
Related	...
File
URL	...			DOI	...
	Online publication. Cite with this text: ...

Location Field:	my name & email address

Home

SQL Search | Library Search | Show Record | Extract Citations

Help