. 2011 Jan 20;7(1):e1001048.

doi: 10.1371/journal.pcbi.1001048.

Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings

Elise Payzan-LeNestour¹, Peter Bossaerts

Affiliations

PMID: 21283774
PMCID: PMC3024253
DOI: 10.1371/journal.pcbi.1001048

Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings

Elise Payzan-LeNestour et al. PLoS Comput Biol. 2011.

. 2011 Jan 20;7(1):e1001048.

doi: 10.1371/journal.pcbi.1001048.

Authors

Elise Payzan-LeNestour¹, Peter Bossaerts

Affiliation

¹ University of New South Wales, Sydney, Australia. elise@unsw.edu.au

PMID: 21283774
PMCID: PMC3024253
DOI: 10.1371/journal.pcbi.1001048

Abstract

Recently, evidence has emerged that humans approach learning using Bayesian updating rather than (model-free) reinforcement algorithms in a six-arm restless bandit problem. Here, we investigate what this implies for human appreciation of uncertainty. In our task, a Bayesian learner distinguishes three equally salient levels of uncertainty. First, the Bayesian perceives irreducible uncertainty or risk: even knowing the payoff probabilities of a given arm, the outcome remains uncertain. Second, there is (parameter) estimation uncertainty or ambiguity: payoff probabilities are unknown and need to be estimated. Third, the outcome probabilities of the arms change: the sudden jumps are referred to as unexpected uncertainty. We document how the three levels of uncertainty evolved during the course of our experiment and how it affected the learning rate. We then zoom in on estimation uncertainty, which has been suggested to be a driving force in exploration, in spite of evidence of widespread aversion to ambiguity. Our data corroborate the latter. We discuss neural evidence that foreshadowed the ability of humans to distinguish between the three levels of uncertainty. Finally, we investigate the boundaries of human capacity to implement Bayesian learning. We repeat the experiment with different instructions, reflecting varying levels of structural uncertainty. Under this fourth notion of uncertainty, choices were no better explained by Bayesian updating than by (model-free) reinforcement learning. Exit questionnaires revealed that participants remained unaware of the presence of unexpected uncertainty and failed to acquire the right model with which to implement Bayesian updating.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Six-arm restless bandit task.**
A The six-arm restless bandit is implemented graphically as a board game. Six locations correspond to the six arms. Locations are color-coded; blue locations have lower average unexpected uncertainty than red locations. Blue locations pay 1, 0 or −1 CHF (Swiss francs). Red locations pay 2, 0 or −2 CHF. Chosen option is highlighted (in this case, location 5). Participants can freely choose a location each trial. Histories of outcomes in locations chosen in the past are shown by means of coin piles. B Visual representation of risk and estimation uncertainty. Risk can be tracked using entropy, which depends on the relative magnitudes of the outcome probabilities, i.e., the relative heights of the bars in the left chart. The bars represent the three estimated outcome probabilities (mean of the posterior probability distribution or PPD). Entropy (risk) is maximal when the bars are all equal. Estimation uncertainty is represented by the widths of the posterior distributions of the outcome probabilities, depicted in the right chart.

**Figure 2. Three kinds of uncertainty in the task.**
A Evolution of the estimation uncertainty (entropy of mean posterior outcome probabilities) of chosen options in one instance of the board game. Learning is based on choices of one participant in our experiment. Blue dots on the horizontal axis indicate trials when a blue location was chosen; red dots indicate trials when a red location was visited. B Evolution of the unexpected uncertainty of chosen options in one instance of the board game, measured (inversely) as the probability that no jump has occurred. Learning is based on choices of one participant in our experiment. Blue dots on the horizontal axis indicate trials when outcome probabilities for the visited blue location jumped; red dots indicate trials when outcome probabilities for the visited red location jumped. C Average estimated risk (entropy of outcome probabilities) in one instance of the board game, by location (numbered 1 to 6). Learning is based on the choices of one participant in our experiment. Locations are arranged by level of unexpected uncertainty (blue: low; red: high). Average estimated risks are compared with true risks. The participant managed to distinguish risk differentials across blue locations, but not across red locations. Average estimated risks regress towards the grand mean because of estimation uncertainty after each jump in outcome probabilities.

**Figure 3. Evolution of the (logarithm of the) Bayesian learning rate for two options in one instance of the board game.**
Learning is based on the choices of one participant in our experiment. Top option has low average unexpected uncertainty (low chance of jumps) and low risk (one outcome probability was very high); bottom option has high average unexpected uncertainty and low risk. Crosses on the horizontal axis indicate trials when the option was chosen.

**Figure 4. Goodness-of-fits of the Bayesian models, with (right) and without (left) penalty for ambiguity.**
Based on approximately 500 choices of 62 participants. Data are from . Heights of bars indicate mean of the individual negative log-likelihood; line segments indicate standard deviations. : ; : ; : .

formula image — **Figure 4. Goodness-of-fits of the Bayesian models, with (right) and without (left) penalty for ambiguity.**
Based on approximately 500 choices of 62 participants. Data are from . Heights of bars indicate mean of the individual negative log-likelihood; line segments indicate standard deviations. : ; : ; : .

**Figure 5. Replication of the experiment in .**
Mean BICs and standard deviations of the Bayesian, reinforcement and Pearce-Hall learning models without structural uncertainty (Treatment 3). Based on the choices of 30 participants in approximately 500 trials of our board game. The Bayesian model is the base version (unadjusted for ambiguity aversion). : ; : ; : .

**Figure 6. Goodness-of-fits of the Bayesian and reinforcement learning models under varying levels of structural uncertainty.**
A Goodness-of-fits of the Bayesian and reinforcement learning models under full structural uncertainty (Treatment 1). Based on the choices of 43 participants in approximately 500 trials of our board game. The Bayesian model includes a penalty for estimation uncertainty – like in the data from , this model turned out to fit the data better than the base version of the Bayesian model. Heights of bars indicate mean of the individual Bayesian Information Criterion (BIC); line segments indicate standard deviations. The difference in the mean BIC is not significant (). B Goodness-of-fits of the Bayesian and reinforcement learning models under partial structural uncertainty (Treatment 2). Mean BICs and standard deviations of the Bayesian and reinforcement learning models in Treatment 2. Based on the choices of 32 participants in approximately 500 trials of our board game. The Bayesian model includes a penalty for estimation uncertainty. Participants knew the structure of the game except for the jumps in outcome probabilities. They were told that the description of the structure was incomplete. : ; : ; : .

See this image and copyright information in PMC

References

1. Daw ND, Niv Y, Dayan P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci. 2005;8:1704–1711. - PubMed
1. Dayan P, Long T. Statistical Models of Conditioning. In: Kearns MJ, et al., editors. Conf Proc Adv Neural Inf Process Syst Vol 10. MIT Press; 1997. pp. 117–123.
1. Yoshida W, Ishii S. Resolution of uncertainty in prefrontal cortex. Neuron. 2006;50:781–789. - PubMed
1. Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10(9):1214–21. - PubMed
1. Yu AJ, Dayan P. Uncertainty, neuromodulation, and attention. Neuron. 2005;46:681–692. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings

Affiliation

Risk, unexpected uncertainty, and estimation uncertainty: Bayesian learning in unstable settings

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources