Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness
- PMID: 35185502
- PMCID: PMC8855153
- DOI: 10.3389/fncom.2021.784592
Reinforcement Learning Model With Dynamic State Space Tested on Target Search Tasks for Monkeys: Self-Determination of Previous States Based on Experience Saturation and Decision Uniqueness
Abstract
The real world is essentially an indefinite environment in which the probability space, i. e., what can happen, cannot be specified in advance. Conventional reinforcement learning models that learn under uncertain conditions are given the state space as prior knowledge. Here, we developed a reinforcement learning model with a dynamic state space and tested it on a two-target search task previously used for monkeys. In the task, two out of four neighboring spots were alternately correct, and the valid pair was switched after consecutive correct trials in the exploitation phase. The agent was required to find a new pair during the exploration phase, but it could not obtain the maximum reward by referring only to the single previous one trial; it needed to select an action based on the two previous trials. To adapt to this task structure without prior knowledge, the model expanded its state space so that it referred to more than one trial as the previous state, based on two explicit criteria for appropriateness of state expansion: experience saturation and decision uniqueness of action selection. The model not only performed comparably to the ideal model given prior knowledge of the task structure, but also performed well on a task that was not envisioned when the models were developed. Moreover, it learned how to search rationally without falling into the exploration-exploitation trade-off. For constructing a learning model that can adapt to an indefinite environment, the method of expanding the state space based on experience saturation and decision uniqueness of action selection used by our model is promising.
Keywords: decision uniqueness; dynamic state space; experience saturation; exploration-exploitation trade-off; indefinite environment; reinforcement learning; target search task.
Copyright © 2022 Katakura, Yoshida, Hisano, Mushiake and Sakamoto.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures













References
-
- Azizzadenesheli K., Lazaric A., Anandkumar A. (2016). Reinforcement learning of pomdps using spectral methods. JMLR: Workshop Conf. Proc. 49, 1–64.
-
- Beal M. J., Ghahramani Z., Rasmussen C. (2002). The infinite hidden Markov model. Adv. Neural Inform. Proc. Syst. 14, 577–584.
-
- Bhattacharya S., Badyal S., Wheeler T., Gil S., Bertsekas D. (2020). Reinforcement learning for pomdp: partitioned rollout and policy iteration with application to autonomous sequential repair problems. IEEE Robot. Autom. Lett. 5, 3967–3974. 10.1109/LRA.2020.2978451 - DOI
-
- Bouton M., Tumova J., Kochenderfer M. J. (2020). Point-based methods for model checking in partially observable Markov decision processes. Proc. AAAI Conf. Artif. Intell. 34, 10061–10068. 10.1609/aaai.v34i06.6563 - DOI
LinkOut - more resources
Full Text Sources