Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints

Lienhung Chen¹, Zhongliang Jiang¹, Long Cheng², Alois C Knoll¹, Mingchuan Zhou^{1

3}

Affiliations

¹ Department of Computer Science, Technische Universität München, Munich, Germany.
² College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, China.
³ College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China.

PMID: 35586262
PMCID: PMC9108367
DOI: 10.3389/fnbot.2022.883562

Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints

Lienhung Chen et al. Front Neurorobot. 2022.

. 2022 May 2:16:883562.

doi: 10.3389/fnbot.2022.883562. eCollection 2022.

Authors

Lienhung Chen¹, Zhongliang Jiang¹, Long Cheng², Alois C Knoll¹, Mingchuan Zhou^{1

3}

Affiliations

¹ Department of Computer Science, Technische Universität München, Munich, Germany.
² College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, China.
³ College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China.

PMID: 35586262
PMCID: PMC9108367
DOI: 10.3389/fnbot.2022.883562

Abstract

With the advance in algorithms, deep reinforcement learning (DRL) offers solutions to trajectory planning under uncertain environments. Different from traditional trajectory planning which requires lots of effort to tackle complicated high-dimensional problems, the recently proposed DRL enables the robot manipulator to autonomously learn and discover optimal trajectory planning by interacting with the environment. In this article, we present state-of-the-art DRL-based collision-avoidance trajectory planning for uncertain environments such as a safe human coexistent environment. Since the robot manipulator operates in high dimensional continuous state-action spaces, model-free, policy gradient-based soft actor-critic (SAC), and deep deterministic policy gradient (DDPG) framework are adapted to our scenario for comparison. In order to assess our proposal, we simulate a 7-DOF Panda (Franka Emika) robot manipulator in the PyBullet physics engine and then evaluate its trajectory planning with reward, loss, safe rate, and accuracy. Finally, our final report shows the effectiveness of state-of-the-art DRL algorithms for trajectory planning under uncertain environments with zero collision after 5,000 episodes of training.

Keywords: collision avoidance; neural networks; reinforcement learning; robotics; trajectory planning; uncertain environment.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
The comparison of two different observation spaces set up in the first environment. Both are using soft actor-critic (SAC) and with the same hyperparameters setting. The orange line is the result using relative position and velocity in observation space, whereas the blue line is using position and velocity. **(A)** The safe rate (defined in Section 4.2) of different observation spaces. **(B)** The accuracy (defined in Section 4.2) of different observation spaces. Each episode corresponds to 100 time steps.

**Figure 2**
The comparison between different power (n) of the exponential decay function. **(A)** The safe rate (defined in Section 4.2) of three different power (n) of the exponential decay function. **(B)** The accuracy (defined in Section 4.2) of three different power (n) of the exponential decay using an exponential moving average for better visualization. The two figures show that n = 35 has better learning efficiency, safe rate, and accuracy.

**Figure 3**
The comparison between different weights of reward R_O. **(A)** The safe rate (defined in Section 4.2) of three different weights of reward R_O. **(B)** The accuracy (defined in Section 4.2) of three different weights of reward R_O using an exponential moving average for better visualization. The two figures show that c2 = 15 has better learning efficiency, safe rate, and accuracy.

**Figure 4**
Reward function on the planar section of the workspace. **(A)** The 3D plot of the reward function. **(B)** Contour plot of reward function.

**Figure 5**
The behavior of the manipulator with respect to the distance(m) in one of the episodes (100-time steps). **(A)** Distance between robot and obstacles as well as the target (first scenario). **(B)** Distance between robot and obstacles as well as the target (second scenario).

**Figure 6**
From left to right, first, the robot learns to reach the goal. Second, avoid collisions with dynamic obstacles. Third, keep reaching the goal.

**Figure 7**
Performance comparison of SAC and Deep Deterministic Policy Gradient (DDPG) algorithm in the first environment. **(A)** Accuracy of different algorithms shown in error bar line graph. **(B)** A safe rate of different algorithms in the error bar line graph. **(C)** The cumulative reward for each episode. **(D)** Loss for each episode. Each episode corresponds to 100 time steps.

**Figure 8**
Performance comparison of SAC and DDPG algorithm in the second environment. **(A)** Accuracy of different algorithms shown in error bar line graph. **(B)** A safe rate of different algorithms in the error bar line graph. **(C)** The cumulative reward for each episode. **(D)** Loss for each episode. Each episode corresponds to 100 time steps.

See this image and copyright information in PMC

References

1. Adiyatov O., Varol H. A. (2017). A novel RRT*-based algorithm for motion planning in dynamic environments, in 2017 IEEE International Conference on Mechatronics and Automation (ICMA) (Takamatsu: ), 1416–1421. 10.1109/ICMA.2017.8016024 - DOI
1. Amarjyoti S. (2017). Deep reinforcement learning for robotic manipulation-the state of the art. arXiv preprint arXiv:1701.08878. 10.48550/arXiv.1701.08878 - DOI
1. Flacco F., Kröger T., De Luca A., Khatib O. (2012). A depth space approach to human-robot collision avoidance, in 2012 IEEE International Conference on Robotics and Automation (St Paul, MN: ), 338–345. 10.1109/ICRA.2012.6225245 - DOI
1. Gu S., Holly E., Lillicrap T., Levine S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates, in 2017 IEEE International Conference on Robotics and Automation (ICRA) (Singapore: ), 3389–3396. 10.1109/ICRA.2017.7989385 - DOI
1. Gu S., Lillicrap T., Ghahramani Z., Turner R. E., Levine S. (2016). Q-prop: Sample-efficient policy gradient with an off-policy critic. arXiv preprint arXiv:1611.02247. 10.48550/arXiv.1611.02247 - DOI

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints

Affiliations

Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources