. 2022 Mar 31;13(4):564.

doi: 10.3390/mi13040564.

Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment

Wanxing Tang^{1

2}, Chuang Cheng³, Haiping Ai¹, Li Chen²

Affiliations

¹ School of Energy and Mechanical Engineering, Jiangxi University of Science and Technology, Nanchang 330013, China.
² College of Mechanical Engineering, Fuzhou University, Fuzhou 350002, China.
³ College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China.

PMID: 35457870
PMCID: PMC9031963
DOI: 10.3390/mi13040564

Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment

Wanxing Tang et al. Micromachines (Basel). 2022.

. 2022 Mar 31;13(4):564.

doi: 10.3390/mi13040564.

Authors

Wanxing Tang^{1

2}, Chuang Cheng³, Haiping Ai¹, Li Chen²

Affiliations

¹ School of Energy and Mechanical Engineering, Jiangxi University of Science and Technology, Nanchang 330013, China.
² College of Mechanical Engineering, Fuzhou University, Fuzhou 350002, China.
³ College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China.

PMID: 35457870
PMCID: PMC9031963
DOI: 10.3390/mi13040564

Abstract

In this article, the trajectory planning of the two manipulators of the dual-arm robot is studied to approach the patient in a complex environment with deep reinforcement learning algorithms. The shape of the human body and bed is complex which may lead to the collision between the human and the robot. Because the sparse reward the robot obtains from the environment may not support the robot to accomplish the task, a neural network is trained to control the manipulators of the robot to prepare to hold the patient up by using a proximal policy optimization algorithm with a continuous reward function. Firstly, considering the realistic scene, the 3D simulation environment is built to conduct the research. Secondly, inspired by the idea of the artificial potential field, a new reward and punishment function was proposed to help the robot obtain enough rewards to explore the environment. The function is consisting of four parts which include the reward guidance function, collision detection, obstacle avoidance function, and time function. Where the reward guidance function is used to guide the robot to approach the targets to hold the patient, the collision detection and obstacle avoidance function are complementary to each other and are used to avoid obstacles, and the time function is used to reduce the number of training episode. Finally, after the robot is trained to reach the targets, the training results are analyzed. Compared with the DDPG algorithm, the PPO algorithm reduces about 4 million steps for training to converge. Moreover, compared with the other reward and punishment functions, the function used in this paper will obtain many more rewards at the same training time. Apart from that, it will take much less time to converge, and the episode length will be shorter; so, the advantage of the algorithm used in this paper is verified.

Keywords: complex environment; deep reinforcement learning; dual-arm robot; reward; trajectory planning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
The graphical abstract of this article.

**Figure 2**
The overall structure of the robot.

**Figure 3**
The agent–environment interaction in a Markov decision process.

**Figure 4**
The relationship and structure diagram of the three major components of ML agents.

**Figure 6**
The target setting diagram.

**Figure 8**
Collision detection configuration diagram. (a) Robot collision detection configuration. (b) Human body and bed collision detection configuration diagram.

**Figure 10**
PPO algorithm network training flowchart.

**Figure 12**
The Agent training process of generating trajectory.

**Figure 13**
Diagram of training results. The number in the upper right corner of the picture represents the moving order of the robot.

**Figure 14**
Training results diagram. (a) Cumulative Reward. (b) Episode Length.

**Figure 15**
The human posture changing diagram.

**Figure 16**
The training results in posture changing.

**Figure 17**
The training result of DDPG.

See this image and copyright information in PMC

References

1. Yoganandhan A., Kanna G.R., Subhash S.D., Jothi J.H. Retrospective and prospective application of robots and artificial intelligence in global pandemic and epidemic diseases. Vacunas. 2021;22:98–105. doi: 10.1016/j.vacun.2020.12.004. - DOI - PMC - PubMed
1. Kusuma M., Machbub C. Humanoid robot path planning and rerouting using A-Star search algorithm; Proceedings of the 2019 IEEE International Conference on Signals and Systems (ICSigSys); Bandung, Indonesia. 16–18 July 2019.
1. Liu W.H., Zheng X., Deng Z.H. Dynamic collision avoidance for cooperative fixed-wing UAV swarm based on normalized artificial potential field optimization. J. Cent. South Univ. 2021;28:3159–3172. doi: 10.1007/s11771-021-4840-5. - DOI
1. Lee S.J., Baek S.H., Kim J.H. Arm trajectory generation based on rrt* for humanoid robot. Robot Intell. Technol. Appl. 2015;3:373–383.
1. Hua J., Zeng L., Li G., Ju Z. Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning. Sensors. 2021;21:1278. doi: 10.3390/s21041278. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment

Affiliations

Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources