. 2010 Dec 2;6(12):e1001003.

doi: 10.1371/journal.pcbi.1001003.

Structure learning in human sequential decision-making

Daniel E Acuña¹, Paul Schrater

Affiliations

PMID: 21151963
PMCID: PMC2996460
DOI: 10.1371/journal.pcbi.1001003

Structure learning in human sequential decision-making

Daniel E Acuña et al. PLoS Comput Biol. 2010.

. 2010 Dec 2;6(12):e1001003.

doi: 10.1371/journal.pcbi.1001003.

Authors

Daniel E Acuña¹, Paul Schrater

Affiliation

¹ Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America. acuna002@umn.edu

PMID: 21151963
PMCID: PMC2996460
DOI: 10.1371/journal.pcbi.1001003

Abstract

Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people exhibit structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Different structures in sequential decision-making.**
A) General structure. Arcs highlighted denote B) temporal dependency between success probabilities, C) action-dependent reward state leading to different optimality principles—from foraging to maximization and D) reward coupling affecting exploration vs. exploitation demands.

**Figure 2. Graphical models of reward generation.**
The agent faces tasks, each comprising a random number of choices. A) Rewarding options are independent. B) Rewarding options are coupled within a task. C) Mixture of tasks. Rewarding options may be independent or coupled. The node acts as a “XOR” switch between coupled and independent structure.

formula image — **Figure 2. Graphical models of reward generation.**
The agent faces tasks, each comprising a random number of choices. A) Rewarding options are independent. B) Rewarding options are coupled within a task. C) Mixture of tasks. Rewarding options may be independent or coupled. The node acts as a “XOR” switch between coupled and independent structure.

**Figure 3. Learning simulation of structure learning model.**
Four tasks of 50 trials each are sequentially shown to the structure learning model. Priors were and . Marginal beliefs on reward probabilities (brightness indicates relative probability mass), probability of coupling and expected reward are shown as functions of time. A) Simulation on Independent Environment B) Simulation on Coupled Environment.

**Figure 4. Effect of task uncertainty on exploration– exploitation of structure learning model.**
The data available for the options are , , and and discount factor is 0.98, all values fixed for the simulation. The number of failures for option two () is varied from 1 through 3. Under these conditions, the independent would always choose option 1 whereas the coupled model would always choose option 2. However, the structure learning model switches between these two The graph shows the difference in values between the option 2 and 1 as a function of the task uncertainty.

**Figure 5. Full behavior on diagnostic trials as a function of evidence and confidence.**
Diagnostic trials are those in which there is at least one disagreement between the models. For each of these trials, we compute the evidence and confidence of each option. A cell in the graph indicates the empirical probability that the model (or participants) pick the better option as a function of evidence and confidence. The right panels show prediction rate of different models in diagnostic trials. All pair-wise differences are significant () A) Trials in Independent Environment B) Trials in Coupled Environment.

**Figure 6. Better arm selection ratio.**
In the diagnostic trials, A) and C) Belief in coupling tracks changes in participant choices similarly to the learning model B) and D) behavior vs. structure belief is well correlated with the learning model, but not with independent and coupled.

**Figure 7. Model comparison in different aspects of decision-making.**
A and B) Performance of learning model and coupled model for decisions not predicted by the independent model in the independent environment (separated into *under-exploratory* and *over-exploratory* trials) C) Prediction performance for trials where independent and coupled model prefer one option whereas the learning model prefers the other. These trials are called *task learning* trials.

See this image and copyright information in PMC

References

1. Bellman RE. A problem in the sequential design of experiments. Sankhyā. 1956;16:221–229.
1. Gittins JC. Multi-armed bandit allocation indices. Chichester [West Sussex]; New York: Wiley; 1989.
1. Whittle P. Restless bandits: activity allocation in a changing world. J Appl Probab. 1988;25:287–298.
1. Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441:876–879. - PMC - PubMed
1. Yi MS, Steyvers M, Lee M. Modeling human performance in restless bandits with particle filters. The Journal of Problem Solving. 2009;2 Available: http://docs.lib.purdue.edu/jps/vol2/iss2/5/

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Structure learning in human sequential decision-making

Affiliation

Structure learning in human sequential decision-making

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources