. 2025 Nov 25;25(23):7203.

doi: 10.3390/s25237203.

Research on Deep Learning-Based Human-Robot Static/Dynamic Gesture-Driven Control Framework

Gong Zhang^{1

2}, Jiahong Su^{2

3}, Shuzhong Zhang³, Jianzheng Qi³, Zhicheng Hou¹, Qunxu Lin⁴

Affiliations

¹ School of Automation, Guangdong Polytechnic Normal University, Guangzhou 510665, China.
² Institute for Super Robotics (Huangpu), South China University of Technology, Guangzhou 510700, China.
³ School of Mechanical and Automotive Engineering, Fujian University of Technology, Fuzhou 350108, China.
⁴ School of Rail Transportation, Wuyi University, Jiangmen 529020, China.

PMID: 41374578
PMCID: PMC12693889
DOI: 10.3390/s25237203

Research on Deep Learning-Based Human-Robot Static/Dynamic Gesture-Driven Control Framework

Gong Zhang et al. Sensors (Basel). 2025.

. 2025 Nov 25;25(23):7203.

doi: 10.3390/s25237203.

Authors

Gong Zhang^{1

2}, Jiahong Su^{2

3}, Shuzhong Zhang³, Jianzheng Qi³, Zhicheng Hou¹, Qunxu Lin⁴

Affiliations

¹ School of Automation, Guangdong Polytechnic Normal University, Guangzhou 510665, China.
² Institute for Super Robotics (Huangpu), South China University of Technology, Guangzhou 510700, China.
³ School of Mechanical and Automotive Engineering, Fujian University of Technology, Fuzhou 350108, China.
⁴ School of Rail Transportation, Wuyi University, Jiangmen 529020, China.

PMID: 41374578
PMCID: PMC12693889
DOI: 10.3390/s25237203

Abstract

For human-robot gesture-driven control, this paper proposes a deep learning-based approach that employs both static and dynamic gestures to drive and control robots for object-grasping and delivery tasks. The method utilizes two-dimensional Convolutional Neural Networks (2D-CNNs) for static gesture recognition and a hybrid architecture combining three-dimensional Convolutional Neural Networks (3D-CNNs) and Long Short-Term Memory networks (3D-CNN+LSTM) for dynamic gesture recognition. Results on a custom gesture dataset demonstrate validation accuracies of 95.38% for static gestures and 93.18% for dynamic gestures, respectively. Then, in order to control and drive the robot to perform corresponding tasks, hand pose estimation was performed. The MediaPipe machine learning framework was first employed to extract hand feature points. These 2D feature points were then converted into 3D coordinates using a depth camera-based pose estimation method, followed by coordinate system transformation to obtain hand poses relative to the robot's base coordinate system. Finally, an experimental platform for human-robot gesture-driven interaction was established, deploying both gesture recognition models. Four participants were invited to perform 100 trials each of gesture-driven object-grasping and delivery tasks under three lighting conditions: natural light, low light, and strong light. Experimental results show that the average success rates for completing tasks via static and dynamic gestures are no less than 96.88% and 94.63%, respectively, with task completion times consistently within 20 s. These findings demonstrate that the proposed approach enables robust vision-based robotic control through natural hand gestures, showing great prospects for human-robot collaboration applications.

Keywords: deep learning; dynamic and static gesture; gesture-driven control framework; human-robot collaboration; three-dimensional Convolutional Neural Networks.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Human–robot gesture-driven overall technical workflow.

**Figure 2**
2D-CNN network architecture.

**Figure 3**
3D-CNN network architecture.

**Figure 4**
3D-CNN+LSTM hybrid network architecture.

**Figure 7**
The three vectors of the hand and the hand orientation coordinate system for (**left**) the three-key-point vector, and (**right**) the hand orientation coordinate system.

**Figure 8**
Human–robot static/dynamic gesture-driven experiment platform.

**Figure 9**
The static gesture “closed fist” drives the robot to grasp and deliver a “bowl”.

**Figure 10**
The static gesture “index finger” drives the robot to grasp and deliver a “banana”.

**Figure 11**
The dynamic gesture “waving side-to-side” drives the robot to grasp and deliver a “beverage can”.

**Figure 12**
The dynamic gesture “backward beckoning” drives the robot to grasp and deliver a “drinking cup”.

See this image and copyright information in PMC

References

1. Zhang G., Xu Z., Hou Z., Yang W., Liang J., Yang G., Wang J., Wang H., Han C. A systematic error compensation strategy based on an optimized recurrent neural network for collaborative robot dynamics. Appl. Sci. 2020;10:6743. doi: 10.3390/app10196743. - DOI
1. Patel H.K., Rai V., Singh H.R., Kumar R. Analyzing body language and facial expressions using machine learning techniques; Proceedings of the 2025 International Conference on Pervasive Computational Technologies (ICPCT); Greater Noida, India. 8–9 February 2025; pp. 629–633.
1. Petrov M., Chibizov P., Sintsov M., Balashov M., Kapravchuk V., Briko A. Multichannel surface electromyography system for prosthesis control using RNN classifier; Proceedings of the 2023 Systems and Technologies of the Digital HealthCare (STDH); Tashkent, Uzbekistan. 4–6 October 2023; pp. 93–96.
1. Scheck K., Ren Z., Dombeck T., Sonnert J., Gogh S.V., Hou Q., Wand M., Schultz T. Cross-speaker training and adaptation for electromyography-to-speech conversion; Proceedings of the 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); Orlando, FL, USA. 15–19 July 2024; pp. 1–4. - PubMed
1. Hashimoto Y. Lightweight and high accurate RR interval compensation for signals from wearable ECG sensors. IEEE Sens. Lett. 2024;8:1–4. doi: 10.1109/LSENS.2024.3398251. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- MDPI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Research on Deep Learning-Based Human-Robot Static/Dynamic Gesture-Driven Control Framework

Affiliations

Research on Deep Learning-Based Human-Robot Static/Dynamic Gesture-Driven Control Framework

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources