. 2021 Dec 16:2021:9945044.

doi: 10.1155/2021/9945044. eCollection 2021.

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Xiaogang Ruan^{1

2}, Peng Li^{1

2}, Xiaoqing Zhu^{1

2}, Hejie Yu^{1

2}, Naigong Yu^{1

2}

Affiliations

¹ Faculty of Information Technology, Beijing University of Technology, Beijing, China.
² Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing, China.

PMID: 34956359
PMCID: PMC8702337
DOI: 10.1155/2021/9945044

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Xiaogang Ruan et al. Comput Intell Neurosci. 2021.

. 2021 Dec 16:2021:9945044.

doi: 10.1155/2021/9945044. eCollection 2021.

Authors

Xiaogang Ruan^{1

2}, Peng Li^{1

2}, Xiaoqing Zhu^{1

2}, Hejie Yu^{1

2}, Naigong Yu^{1

2}

Affiliations

¹ Faculty of Information Technology, Beijing University of Technology, Beijing, China.
² Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing, China.

PMID: 34956359
PMCID: PMC8702337
DOI: 10.1155/2021/9945044

Abstract

Developing artificial intelligence (AI) agents is challenging for efficient exploration in visually rich and complex environments. In this study, we formulate the exploration question as a reinforcement learning problem and rely on intrinsic motivation to guide exploration behavior. Such intrinsic motivation is driven by curiosity and is calculated based on episode memory. To distribute the intrinsic motivation, we use a count-based method and temporal distance to generate it synchronously. We tested our approach in 3D maze-like environments and validated its performance in exploration tasks through extensive experiments. The experimental results show that our agent can learn exploration ability from raw sensory input and accomplish autonomous exploration across different mazes. In addition, the learned policy is not biased by stochastic objects. We also analyze the effects of different training methods and driving forces on exploration policy.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest to report regarding the present study.

Figures

**Figure 2**
The A3C algorithm flow chart.

**Figure 5**
Calculation process of intrinsic motivation.

**Figure 7**
Simulation environment. (a) Go forward. (b) Apple. (c) Goal. (d) Door.

**Figure 8**
Parameter selection environment.

**Figure 9**
Experiment results of the reward function parameter.

**Figure 10**
Top-down view of test mazes. (a) Maze-1. (b) Maze-2. (c) Maze-3.

**Figure 11**
Experiment results of learning exploration from scratch. (a) Learning curves in Maze-1. (b) Learning curves in Maze-2. (c) Learning curves in Maze-3.

**Figure 12**
Experiment results of learning exploration with fine-tuning method (no extrinsic reward). (a) Learning curves in Maze-1. (b) Learning curves in Maze-2. (c) Learning curves in Maze-3.

**Figure 13**
Experiment results of learning exploration with fine-tuning method (exist extrinsic reward). (a) Learning curves in Maze-1. (b) Learning curves in Maze-2. (c) Learning curves in Maze-3.

**Figure 14**
Experiment results of “noisy-TV.” (a) Learning curves in Maze-1. (b) Learning curves in Maze-2. (c) Learning curves in Maze-3.

See this image and copyright information in PMC

References

1. Oudeyer P. Y. Computational theories of curiosity-driven learning. 2018. https://arxiv.org/abs/1802.10546 .
1. Tolman E. C. Cognitive maps in rats and men. Psychological Review . 1948;55(4):189–208. doi: 10.1037/h0061626. - DOI - PubMed
1. Gupta S., Tolani V., Davidson J., Levine S., Sukthankar R., Malik J. Cognitive mapping and planning for visual navigation. 2019. https://arxiv.org/abs/1702.3920 .
1. Cadena C., Carlone L., Carrillo H., et al. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Transactions on Robotics . 2016;32(6):1309–1332. doi: 10.1109/tro.2016.2624754. - DOI
1. Abed-Alguni B. H. Action-selection method for reinforcement learning based on cuckoo search algorithm. Arabian Journal for Science and Engineering . 2018;43(12):6771–6785. doi: 10.1007/s13369-017-2873-8. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Affiliations

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources