A Dynamic Gesture Recognition Method Based on R(2+1)D-Transformer Network

Yupeng Huo, Jie Shen, Xu Chen, Keming Yu

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Efficient spatial-temporal feature extraction from input video streams is crucial for dynamic gesture recognition. In the task of video classification, convolutional neural networks (CNNs) are widely used as feature extractors, while methods based on recurrent neural networks (RNNs) are commonly employed for sequence modeling. However, RNNs lack the ability to model global dependencies and have a limited attention span in the temporal dimension. This becomes a performance bottleneck for dynamic gestures that require sensitivity to temporal correlations. To address this issue, this paper proposes a dynamic gesture recognition model called R(2+1)D-Transformer. It is a Transformer-based approach that focuses on global modeling. Firstly, the R(2+1)D network is employed as a spatial-temporal feature extractor to capture the spatiotemporal information. Then, self-attention-based Transformer is used to map the spatiotemporal feature sequence to the semantic representation of gesture movements, considering both the temporal and spatial context. Finally, the gesture recognition results are obtained through an MLP classification head. Experimental results demonstrate the effectiveness and potential of the proposed R(2+1)D-Transformer model on two publicly available dynamic gesture datasets, IPN-Hand and NvGesture. The promising performance of the proposed approach provides valuable insights and reference for further research and applications in dynamic gesture recognition.

源语言英语
主期刊名Third International Conference on Computer Graphics, Image, and Virtualization, ICCGIV 2023
编辑Yulin Wang, Ata Jahangir Moshayedi
出版商SPIE
ISBN(电子版)9781510671720
DOI
出版状态已出版 - 2023
活动3rd International Conference on Computer Graphics, Image, and Virtualization, ICCGIV 2023 - Nanjing, 中国
期限: 16 6月 202318 6月 2023

出版系列

姓名Proceedings of SPIE - The International Society for Optical Engineering
12934
ISSN(印刷版)0277-786X
ISSN(电子版)1996-756X

会议

会议3rd International Conference on Computer Graphics, Image, and Virtualization, ICCGIV 2023
国家/地区中国
Nanjing
时期16/06/2318/06/23

指纹

探究 'A Dynamic Gesture Recognition Method Based on R(2+1)D-Transformer Network' 的科研主题。它们共同构成独一无二的指纹。

引用此