Modular deep reinforcement learning from reward and punishment for robot navigation
- PMID: 33383526
- DOI: 10.1016/j.neunet.2020.12.001
Modular deep reinforcement learning from reward and punishment for robot navigation
Abstract
Modular Reinforcement Learning decomposes a monolithic task into several tasks with sub-goals and learns each one in parallel to solve the original problem. Such learning patterns can be traced in the brains of animals. Recent evidence in neuroscience shows that animals utilize separate systems for processing rewards and punishments, illuminating a different perspective for modularizing Reinforcement Learning tasks. MaxPain and its deep variant, Deep MaxPain, showed the advances of such dichotomy-based decomposing architecture over conventional Q-learning in terms of safety and learning efficiency. These two methods differ in policy derivation. MaxPain linearly unified the reward and punishment value functions and generated a joint policy based on unified values; Deep MaxPain tackled scaling problems in high-dimensional cases by linearly forming a joint policy from two sub-policies obtained from their value functions. However, the mixing weights in both methods were determined manually, causing inadequate use of the learned modules. In this work, we discuss the signal scaling of reward and punishment related to discounting factor γ, and propose a weak constraint for signaling design. To further exploit the learning models, we propose a state-value dependent weighting scheme that automatically tunes the mixing weights: hard-max and softmax based on a case analysis of Boltzmann distribution. We focus on maze-solving navigation tasks and investigate how two metrics (pain-avoiding and goal-reaching) influence each other's behaviors during learning. We propose a sensor fusion network structure that utilizes lidar and images captured by a monocular camera instead of lidar-only and image-only sensing. Our results, both in the simulation of three types of mazes with different complexities and a real robot experiment of an L-maze on Turtlebot3 Waffle Pi, showed the improvements of our methods.
Keywords: Deep reinforcement learning; Max pain; Maze solving; Modular reinforcement learning; Robot navigation.
Copyright © 2020 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
Reinforcement learning using a continuous time actor-critic framework with spiking neurons.PLoS Comput Biol. 2013 Apr;9(4):e1003024. doi: 10.1371/journal.pcbi.1003024. Epub 2013 Apr 11. PLoS Comput Biol. 2013. PMID: 23592970 Free PMC article.
-
People teach with rewards and punishments as communication, not reinforcements.J Exp Psychol Gen. 2019 Mar;148(3):520-549. doi: 10.1037/xge0000569. J Exp Psychol Gen. 2019. PMID: 30802127
-
Vision-Based Robot Navigation through Combining Unsupervised Learning and Hierarchical Reinforcement Learning.Sensors (Basel). 2019 Apr 1;19(7):1576. doi: 10.3390/s19071576. Sensors (Basel). 2019. PMID: 30939807 Free PMC article.
-
REWARD AND PUNISHMENT ASSOCIATED WITH THE SAME GOAL RESPONSE: A FACTOR IN THE LEARNING OF MOTIVES.Psychol Bull. 1963 Sep;60:441-51. doi: 10.1037/h0045000. Psychol Bull. 1963. PMID: 14051061 Review. No abstract available.
-
Measuring reinforcement learning and motivation constructs in experimental animals: relevance to the negative symptoms of schizophrenia.Neurosci Biobehav Rev. 2013 Nov;37(9 Pt B):2149-65. doi: 10.1016/j.neubiorev.2013.08.007. Epub 2013 Aug 28. Neurosci Biobehav Rev. 2013. PMID: 23994273 Free PMC article. Review.
Cited by
-
Advances in non-invasive biosensing measures to monitor wound healing progression.Front Bioeng Biotechnol. 2022 Sep 23;10:952198. doi: 10.3389/fbioe.2022.952198. eCollection 2022. Front Bioeng Biotechnol. 2022. PMID: 36213059 Free PMC article. Review.
-
Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks.PLoS Comput Biol. 2023 Aug 18;19(8):e1011385. doi: 10.1371/journal.pcbi.1011385. eCollection 2023 Aug. PLoS Comput Biol. 2023. PMID: 37594982 Free PMC article.
-
Application of an adapted FMEA framework for robot-inclusivity of built environments.Sci Rep. 2022 Mar 1;12(1):3408. doi: 10.1038/s41598-022-06902-4. Sci Rep. 2022. PMID: 35233018 Free PMC article.
-
Having multiple selves helps learning agents explore and adapt in complex changing worlds.Proc Natl Acad Sci U S A. 2023 Jul 11;120(28):e2221180120. doi: 10.1073/pnas.2221180120. Epub 2023 Jul 3. Proc Natl Acad Sci U S A. 2023. PMID: 37399387 Free PMC article.
-
Mobile Robot Application with Hierarchical Start Position DQN.Comput Intell Neurosci. 2022 Sep 5;2022:4115767. doi: 10.1155/2022/4115767. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 36105641 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous