TY - GEN
T1 - Dynamic gesture recognition method based on Improved R(2+1)D
AU - Huo, Yupeng
AU - Shen, Jie
AU - Zhang, Sheng
AU - Wang, Li
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Currently, methods based on 3D convolutional neural networks have made significant progress in the field of dynamic gesture recognition. Dynamic gestures are highly redundant in both the temporal and spatial dimensions, and the complex environment during the recognition process can easily affect the final recognition results. Therefore, it is crucial to make the model focus on the important moments and regions of gesture movements and extract relevant salient spatiotemporal features to further improve model performance. To address this issue, this paper proposes a lightweight Temporal-Spatial-Channel attention (TSCA) module based on the R(2+1)D network. The module consists of two sub-modules: a Temporal-Channel attention (TCA) module and a Temporal-Spatial attention (TSA) module, with the goal of enabling the model to focus on important information along the spatial, channel, and temporal dimensions during gesture movements. Finally, the TSCA attention module is integrated into the R(2+1)D network, resulting in only a 2.8M increase in parameters, and achieves good performance on the IPN-Hand and NvGesture datasets.
AB - Currently, methods based on 3D convolutional neural networks have made significant progress in the field of dynamic gesture recognition. Dynamic gestures are highly redundant in both the temporal and spatial dimensions, and the complex environment during the recognition process can easily affect the final recognition results. Therefore, it is crucial to make the model focus on the important moments and regions of gesture movements and extract relevant salient spatiotemporal features to further improve model performance. To address this issue, this paper proposes a lightweight Temporal-Spatial-Channel attention (TSCA) module based on the R(2+1)D network. The module consists of two sub-modules: a Temporal-Channel attention (TCA) module and a Temporal-Spatial attention (TSA) module, with the goal of enabling the model to focus on important information along the spatial, channel, and temporal dimensions during gesture movements. Finally, the TSCA attention module is integrated into the R(2+1)D network, resulting in only a 2.8M increase in parameters, and achieves good performance on the IPN-Hand and NvGesture datasets.
KW - Gesture Recognition
KW - R(2+1)D Convolution
KW - Self-Attention
UR - http://www.scopus.com/inward/record.url?scp=85166347983&partnerID=8YFLogxK
U2 - 10.1109/CVIDL58838.2023.10166120
DO - 10.1109/CVIDL58838.2023.10166120
M3 - 会议稿件
AN - SCOPUS:85166347983
T3 - 2023 4th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2023
SP - 323
EP - 328
BT - 2023 4th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2023
Y2 - 12 May 2023 through 14 May 2023
ER -