Hierarchical intrinsically motivated agent planning behavior with dreaming in grid environments
- PMID: 35366128
- PMCID: PMC8976870
- DOI: 10.1186/s40708-022-00156-6
Hierarchical intrinsically motivated agent planning behavior with dreaming in grid environments
Abstract
Biologically plausible models of learning may provide a crucial insight for building autonomous intelligent agents capable of performing a wide range of tasks. In this work, we propose a hierarchical model of an agent operating in an unfamiliar environment driven by a reinforcement signal. We use temporal memory to learn sparse distributed representation of state-actions and the basal ganglia model to learn effective action policy on different levels of abstraction. The learned model of the environment is utilized to generate an intrinsic motivation signal, which drives the agent in the absence of the extrinsic signal, and through acting in imagination, which we call dreaming. We demonstrate that the proposed architecture enables an agent to effectively reach goals in grid environments.
Keywords: Hierarchical temporal memory; Intrinsic motivation; Model-based reinforcement learning; Sparse distributed representations.
© 2022. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
























Similar articles
-
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning.IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3409-3418. doi: 10.1109/TNNLS.2019.2891792. Epub 2019 Jan 29. IEEE Trans Neural Netw Learn Syst. 2019. PMID: 30714933
-
End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation.Comput Intell Neurosci. 2021 Dec 16;2021:9945044. doi: 10.1155/2021/9945044. eCollection 2021. Comput Intell Neurosci. 2021. PMID: 34956359 Free PMC article.
-
Learning a Set of Interrelated Tasks by Using a Succession of Motor Policies for a Socially Guided Intrinsically Motivated Learner.Front Neurorobot. 2019 Jan 8;12:87. doi: 10.3389/fnbot.2018.00087. eCollection 2018. Front Neurorobot. 2019. PMID: 30670961 Free PMC article.
-
The role of prediction and outcomes in adaptive cognitive control.J Physiol Paris. 2015 Feb-Jun;109(1-3):38-52. doi: 10.1016/j.jphysparis.2015.02.001. Epub 2015 Feb 17. J Physiol Paris. 2015. PMID: 25698177 Review.
-
Contributions of the basal ganglia to action sequence learning and performance.Neurosci Biobehav Rev. 2019 Dec;107:279-295. doi: 10.1016/j.neubiorev.2019.09.017. Epub 2019 Sep 18. Neurosci Biobehav Rev. 2019. PMID: 31541637 Review.
Cited by
-
IoT and Deep Learning-Based Farmer Safety System.Sensors (Basel). 2023 Mar 8;23(6):2951. doi: 10.3390/s23062951. Sensors (Basel). 2023. PMID: 36991662 Free PMC article.
References
-
- Ahmad S, Hawkins J (2015) Properties of sparse distributed representations and their application to hierarchical temporal memory. arXiv: 1503.07469
-
- Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Pieter Abbeel O, Zaremba W (2017) Hindsight experience replay. In: Advances in neural information processing systems, 30
-
- Antonio Becerra J, Romero A, Bellas F, Duro RJ. Motivational engine and long-term memory coupling within a cognitive architecture for lifelong open-ended learning. Neurocomputing. 2021;452:341–354. doi: 10.1016/j.neucom.2019.10.124. - DOI
-
- Asada M, MacDorman KF, Ishiguro H, Kuniyoshi Y. Cognitive developmental robotics as a new paradigm for the design of humanoid robots. Robot Auton Syst. 2001;37(2–3):185–193. doi: 10.1016/S0921-8890(01)00157-9. - DOI
-
- Bacon P-L, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the thirty-first AAAI conference on artificial intelligence. AAAI’17, AAAI Press, pp 1726–34
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous