%0 Conference Proceedings
%T Residual Stacked RNNs for Action Recognition
%A Mohamed Ilyes Lakhal
%A Albert Clapes
%A Sergio Escalera
%A Oswald Lanz
%A Andrea Cavallaro
%B 9th International Workshop on Human Behavior Understanding
%D 2018
%F Mohamed Ilyes Lakhal2018
%O HUPBA; no proj;MILAB
%O exported from refbase (http://refbase.cvc.uab.es/show.php?record=3206), last updated on Fri, 21 Jan 2022 14:45:27 +0100
%X Action recognition pipelines that use Recurrent Neural Networks (RNN) are currently 5–10% less accurate than Convolutional Neural Networks (CNN). While most works that use RNNs employ a 2D CNN on each frame to extract descriptors for action recognition, we extract spatiotemporal features from a 3D CNN and then learn the temporal relationship of these descriptors through a stacked residual recurrent neural network (Res-RNN). We introduce for the first time residual learning to counter the degradation problem in multi-layer RNNs, which have been successful for temporal aggregation in two-stream action recognition pipelines. Finally, we use a late fusion strategy to combine RGB and optical flow data of the two-stream Res-RNN. Experimental results show that the proposed pipeline achieves competitive results on UCF-101 and state of-the-art results for RNN-like architectures on the challenging HMDB-51 dataset.
%K Action recognition
%K Deep residual learning
%K Two-stream RNN
%U https://doi.org/10.1007/978-3-030-11012-3_40
%U http://refbase.cvc.uab.es/files/LCE2018b.pdf
%P 534-548