Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 16:2021:9945044.
doi: 10.1155/2021/9945044. eCollection 2021.

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Affiliations

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Xiaogang Ruan et al. Comput Intell Neurosci. .

Abstract

Developing artificial intelligence (AI) agents is challenging for efficient exploration in visually rich and complex environments. In this study, we formulate the exploration question as a reinforcement learning problem and rely on intrinsic motivation to guide exploration behavior. Such intrinsic motivation is driven by curiosity and is calculated based on episode memory. To distribute the intrinsic motivation, we use a count-based method and temporal distance to generate it synchronously. We tested our approach in 3D maze-like environments and validated its performance in exploration tasks through extensive experiments. The experimental results show that our agent can learn exploration ability from raw sensory input and accomplish autonomous exploration across different mazes. In addition, the learned policy is not biased by stochastic objects. We also analyze the effects of different training methods and driving forces on exploration policy.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest to report regarding the present study.

Figures

Figure 1
Figure 1
AC algorithm flow chart.
Figure 2
Figure 2
The A3C algorithm flow chart.
Figure 3
Figure 3
Nav A3C model.
Figure 4
Figure 4
TC-network model.
Figure 5
Figure 5
Calculation process of intrinsic motivation.
Figure 6
Figure 6
Exploration model.
Figure 7
Figure 7
Simulation environment. (a) Go forward. (b) Apple. (c) Goal. (d) Door.
Figure 8
Figure 8
Parameter selection environment.
Figure 9
Figure 9
Experiment results of the reward function parameter.
Figure 10
Figure 10
Top-down view of test mazes. (a) Maze-1. (b) Maze-2. (c) Maze-3.
Figure 11
Figure 11
Experiment results of learning exploration from scratch. (a) Learning curves in Maze-1. (b) Learning curves in Maze-2. (c) Learning curves in Maze-3.
Figure 12
Figure 12
Experiment results of learning exploration with fine-tuning method (no extrinsic reward). (a) Learning curves in Maze-1. (b) Learning curves in Maze-2. (c) Learning curves in Maze-3.
Figure 13
Figure 13
Experiment results of learning exploration with fine-tuning method (exist extrinsic reward). (a) Learning curves in Maze-1. (b) Learning curves in Maze-2. (c) Learning curves in Maze-3.
Figure 14
Figure 14
Experiment results of “noisy-TV.” (a) Learning curves in Maze-1. (b) Learning curves in Maze-2. (c) Learning curves in Maze-3.

References

    1. Oudeyer P. Y. Computational theories of curiosity-driven learning. 2018. https://arxiv.org/abs/1802.10546 .
    1. Tolman E. C. Cognitive maps in rats and men. Psychological Review . 1948;55(4):189–208. doi: 10.1037/h0061626. - DOI - PubMed
    1. Gupta S., Tolani V., Davidson J., Levine S., Sukthankar R., Malik J. Cognitive mapping and planning for visual navigation. 2019. https://arxiv.org/abs/1702.3920 .
    1. Cadena C., Carlone L., Carrillo H., et al. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Transactions on Robotics . 2016;32(6):1309–1332. doi: 10.1109/tro.2016.2624754. - DOI
    1. Abed-Alguni B. H. Action-selection method for reinforcement learning based on cuckoo search algorithm. Arabian Journal for Science and Engineering . 2018;43(12):6771–6785. doi: 10.1007/s13369-017-2873-8. - DOI

LinkOut - more resources