Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment
- PMID: 35457870
- PMCID: PMC9031963
- DOI: 10.3390/mi13040564
Dual-Arm Robot Trajectory Planning Based on Deep Reinforcement Learning under Complex Environment
Abstract
In this article, the trajectory planning of the two manipulators of the dual-arm robot is studied to approach the patient in a complex environment with deep reinforcement learning algorithms. The shape of the human body and bed is complex which may lead to the collision between the human and the robot. Because the sparse reward the robot obtains from the environment may not support the robot to accomplish the task, a neural network is trained to control the manipulators of the robot to prepare to hold the patient up by using a proximal policy optimization algorithm with a continuous reward function. Firstly, considering the realistic scene, the 3D simulation environment is built to conduct the research. Secondly, inspired by the idea of the artificial potential field, a new reward and punishment function was proposed to help the robot obtain enough rewards to explore the environment. The function is consisting of four parts which include the reward guidance function, collision detection, obstacle avoidance function, and time function. Where the reward guidance function is used to guide the robot to approach the targets to hold the patient, the collision detection and obstacle avoidance function are complementary to each other and are used to avoid obstacles, and the time function is used to reduce the number of training episode. Finally, after the robot is trained to reach the targets, the training results are analyzed. Compared with the DDPG algorithm, the PPO algorithm reduces about 4 million steps for training to converge. Moreover, compared with the other reward and punishment functions, the function used in this paper will obtain many more rewards at the same training time. Apart from that, it will take much less time to converge, and the episode length will be shorter; so, the advantage of the algorithm used in this paper is verified.
Keywords: complex environment; deep reinforcement learning; dual-arm robot; reward; trajectory planning.
Conflict of interest statement
The authors declare no conflict of interest.
Figures


















Similar articles
-
Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints.Front Neurorobot. 2022 May 2;16:883562. doi: 10.3389/fnbot.2022.883562. eCollection 2022. Front Neurorobot. 2022. PMID: 35586262 Free PMC article.
-
Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces.Sensors (Basel). 2023 Aug 11;23(16):7124. doi: 10.3390/s23167124. Sensors (Basel). 2023. PMID: 37631658 Free PMC article.
-
Coverage Path Planning Using Actor-Critic Deep Reinforcement Learning.Sensors (Basel). 2025 Mar 5;25(5):1592. doi: 10.3390/s25051592. Sensors (Basel). 2025. PMID: 40096476 Free PMC article.
-
Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning.Sensors (Basel). 2021 Feb 11;21(4):1278. doi: 10.3390/s21041278. Sensors (Basel). 2021. PMID: 33670109 Free PMC article. Review.
-
Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey.Sensors (Basel). 2023 Mar 30;23(7):3625. doi: 10.3390/s23073625. Sensors (Basel). 2023. PMID: 37050685 Free PMC article. Review.
Cited by
-
Can artificial intelligence be the future solution to the enormous challenges and suffering caused by Schizophrenia?Schizophrenia (Heidelb). 2025 Feb 28;11(1):32. doi: 10.1038/s41537-025-00583-4. Schizophrenia (Heidelb). 2025. PMID: 40021674 Free PMC article. Review.
-
A Self-Collision Detection Algorithm of a Dual-Manipulator System Based on GJK and Deep Learning.Sensors (Basel). 2023 Jan 3;23(1):523. doi: 10.3390/s23010523. Sensors (Basel). 2023. PMID: 36617121 Free PMC article.
-
Motion planning framework based on dual-agent DDPG method for dual-arm robots guided by human joint angle constraints.Front Neurorobot. 2024 Feb 22;18:1362359. doi: 10.3389/fnbot.2024.1362359. eCollection 2024. Front Neurorobot. 2024. PMID: 38455735 Free PMC article.
References
-
- Kusuma M., Machbub C. Humanoid robot path planning and rerouting using A-Star search algorithm; Proceedings of the 2019 IEEE International Conference on Signals and Systems (ICSigSys); Bandung, Indonesia. 16–18 July 2019.
-
- Liu W.H., Zheng X., Deng Z.H. Dynamic collision avoidance for cooperative fixed-wing UAV swarm based on normalized artificial potential field optimization. J. Cent. South Univ. 2021;28:3159–3172. doi: 10.1007/s11771-021-4840-5. - DOI
-
- Lee S.J., Baek S.H., Kim J.H. Arm trajectory generation based on rrt* for humanoid robot. Robot Intell. Technol. Appl. 2015;3:373–383.
LinkOut - more resources
Full Text Sources