Deconstructing the human algorithms for exploration
- PMID: 29289795
- PMCID: PMC5801139
- DOI: 10.1016/j.cognition.2017.12.014
Deconstructing the human algorithms for exploration
Abstract
The dilemma between information gathering (exploration) and reward seeking (exploitation) is a fundamental problem for reinforcement learning agents. How humans resolve this dilemma is still an open question, because experiments have provided equivocal evidence about the underlying algorithms used by humans. We show that two families of algorithms can be distinguished in terms of how uncertainty affects exploration. Algorithms based on uncertainty bonuses predict a change in response bias as a function of uncertainty, whereas algorithms based on sampling predict a change in response slope. Two experiments provide evidence for both bias and slope changes, and computational modeling confirms that a hybrid model is the best quantitative account of the data.
Keywords: Bayesian inference; Explore-exploit dilemma; Reinforcement learning.
Copyright © 2017 Elsevier B.V. All rights reserved.
Figures
References
-
- Auer P, Cesa-Bianchi N, Fischer P. Finite-time analysis of the multiarmed bandit problem. Machine Learning. 2002;47:235–256.
-
- Barron G, Erev I. Small feedback-based decisions and their limited correspondence to description-based decisions. Journal of Behavioral Decision Making. 2003;16:215–233.
-
- Bishop CM. Pattern recognition and machine learning. Springer; 2006.
-
- Chapelle O, Li L. Advances in neural information processing systems. 2011. An empirical evaluation of Thompson sampling; pp. 2249–2257.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
