. 2011 Apr 6;31(14):5526-39.

doi: 10.1523/JNEUROSCI.4647-10.2011.

Neural correlates of forward planning in a spatial decision task in humans

Dylan Alexander Simon¹, Nathaniel D Daw

Affiliations

PMID: 21471389
PMCID: PMC3108440
DOI: 10.1523/JNEUROSCI.4647-10.2011

Neural correlates of forward planning in a spatial decision task in humans

Dylan Alexander Simon et al. J Neurosci. 2011.

. 2011 Apr 6;31(14):5526-39.

doi: 10.1523/JNEUROSCI.4647-10.2011.

Authors

Dylan Alexander Simon¹, Nathaniel D Daw

Affiliation

¹ Department of Psychology, Center for Neural Science, New York University, New York, New York 10003, USA. dylex@nyu.edu

PMID: 21471389
PMCID: PMC3108440
DOI: 10.1523/JNEUROSCI.4647-10.2011

Abstract

Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.

PubMed Disclaimer

Figures

**Figure 1.**
Task flow and example state. A, Subjects were cued to choose a direction by pressing a key. If the subject did not respond within 2 s, she lost a turn and was again presented with the same choice (no movement). Otherwise, an animation was shown moving to the room in the selected direction (or to a random room for randomly occurring jumps); this movement lasted 1.5–2 s, jittered uniformly. Then, the next room was presented, including the available transitions from that room and any received reward. Finally, after 0.5 s, the subject was cued to make the next decision. Only the doors in the current room were visible to the subject. B, A possible abstract layout of the task, where each square represents a room, and each arrow represents an available door direction the subject may choose from. The circles represent reward locations, where the subject would gain the indicated reward value each time the room was visited. At each step, each one-way door could flip direction independently with probability $\frac{1}{24}$ .

**Figure 2.**
Behavioral model likelihood comparison. Negative log-likelihood evidence values under BIC. Shown are per-subject log Bayes factors comparing planning against TD.

**Figure 3.**
Value-responsive areas. A, B, T statistic map of group response size to planned (A) and TD-based (B) value predictions from separate models (shown at p < 0.001, uncorrected; significant p < 0.05 FDR clusters highlighted).

**Figure 4.**
Identification of value-related voxels of interest. T statistic map of group response size to either planned or TD-based value predictions (summed contrast, shown at p < 0.001, uncorrected; significance not assessed). The most responsive peak voxels of this map anatomically within striatum were identified for additional analysis.

**Figure 5.**
Striatal BOLD responses to partial value components. Responses (mean effect sizes, arbitrary units) to key components of the value predictions as predicted by the two algorithms in the previously identified VOIs. Also shown are the predicted responses from the overall value fit assuming exponential discounting and updating. Note that significances, as indicated by *p < 0.05 and **p < 0.01, are biased by voxel selection.

**Figure 6.**
Responses to predicted next-step rewards beyond chosen values. T statistic map of responsive regions to choices that are expected to lead to a reward room (r₁), greater than the first two terms of the value equation (r₁ + γr₂; shown at p < 0.001, uncorrected; significant p < 0.05 FDR clusters highlighted).

**Figure 7.**
Response to both one-step predicted and immediate choice count. Masked T statistic map of responses to expected next-step choice set size within regions responsive to current choice set size (all n₀ significant p < 0.05 FDR cluster size; n₁ shown at p < 0.001, uncorrected; two-tailed).

See this image and copyright information in PMC

References

1. Ainslie G. Cambridge, UK: Cambridge UP; 2001. Breakdown of will.
1. Arkadir D, Morris G, Vaadia E, Bergman H. Independent coding of movement direction and reward prediction by single pallidal neurons. J Neurosci. 2004;24:10047–10056. - PMC - PubMed
1. Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37:407–419. - PubMed
1. Balleine BW, Delgado MR, Hikosaka O. The role of the dorsal striatum in reward and decision-making. J Neurosci. 2007;27:8161–8165. - PMC - PubMed
1. Balleine BW, Daw ND, O'Doherty JP. Multiple forms of value learning and the function of dopamine. In: Glimcher PW, Camerer CF, Fehr E, Poldrack RA, editors. Neuroeconomics: decision making and the brain, Chap 24. London: Academic; 2008. pp. 367–387.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 MH087882/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Neural correlates of forward planning in a spatial decision task in humans

Affiliation

Neural correlates of forward planning in a spatial decision task in humans

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases