Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar 11;15(5):786-91.
doi: 10.1038/nn.3068.

Mapping value based planning and extensively trained choice in the human brain

Affiliations

Mapping value based planning and extensively trained choice in the human brain

Klaus Wunderlich et al. Nat Neurosci. .

Abstract

Investigations of the underlying mechanisms of choice in humans have focused on learning from prediction errors, leaving the computational structure of value based planning comparatively underexplored. Using behavioral and neuroimaging analyses of a minimax decision task, we found that the computational processes underlying forward planning are expressed in the anterior caudate nucleus as values of individual branching steps in a decision tree. In contrast, values represented in the putamen pertain solely to values learned during extensive training. During actual choice, both striatal areas showed a functional coupling to ventromedial prefrontal cortex, consistent with this region acting as a value comparator. Our findings point toward an architecture of choice in which segregated value systems operate in parallel in the striatum for planning and extensively trained choices, with medial prefrontal cortex integrating their outputs.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Task and behavioural results
(a) Task flow in planning trials: subjects navigated a 3 layer maze before reaching probabilistic rewards. 8 numbers (randomly changing from trial to trial) displayed reward probabilities of each terminal room. 2nd layer choice was determined by a deterministic value minimizing computer agent that implemented the lowest value option (b) Exemplary planning maze: nodes represent rooms, lines transitions between rooms. Subjects moved forward by freely choosing at the first and third level (cyan circles); the computer determined choice at second level (gray circles). Optimal path (arrows) determined by backward induction of state values using a minimax strategy. State (black) and action (red) values are shown along the choice path. (c) Prior training over 3 days in four single-level mazes with invariant contingencies and distinct reward probabilities (p = .15, .40, .65, .90). Wall colours provided distinguishable contexts that allowed subjects to distinctively identify each trained maze. No explicit information about reward probabilities or contingencies was given. (d) Combination of planned and trained options within the same trial; coloured doors transitioned into trained maze of same colour, the other door followed reduced planning branch with 4 outcome states. No reward probabilities were shown above coloured doors. (e) Average fraction of correct choices according to a tree search planning strategy (PLAN), and two alternative heuristics: highest value (MAX) and higher average value (AVG). Subjects’ choice behaviour pertained to tree search planning strategy and can not be explained by any of the heuristics. Vertical lines = s.e.m. See Table S2 for individual subject behaviour.
Figure 2
Figure 2. Neural correlates of planning versus extensively trained choices
(a) Significant categorical effects for planning > trained trials (red), and trained > planning trials (blue). Medial sectors of basal ganglia including medial caudate, thalamus, bilateral anterior insula, dorsomedial prefrontal cortex, bilateral medial frontal gyrus and precuneus showed enhanced BOLD responses on planning compared to model free cached trials. Lateral posterior putamen, posterior insula extending into the medial temporal gyrus, and somatosensory cortex including postcentral gyrus were more activated when subjects made a response in the extensively trained context. (b) Effect size plots in a regression of planned values, convolved with a canonical HRF, against BOLD data at three time points: 1st choice, subjects’ second choice, outcome. Signals in caudate pertained to the value difference between actual target and alternative values in the choices along the traversed path, as indicated by both significant positive effects for target values and significant negative effects for the alternative values. Asterisks mark significant effects (p < 0.05), a.u. = arbitrary units; vertical lines = s.e.m. Posterior putamen did not significantly correlate with planned values (c) Caudate activity related to classic reward prediction errors during trained trials. Posterior putamen showed significant value representations in extensively trained mazes at time of choice.
Figure 3
Figure 3. Comparing values from planning and values from extensively trained mazes
Value based effect sizes at choice time in mixed planning/trained trials and trials comparing two trained branches. Mixed trials are separately plotted conditional on subjects’ choices. Vc = value of the chosen option, Vnc = value of the not chosen option, a.u. = arbitrary units; vertical lines s.e.m. (a) Caudate represented planned values of the planning branch regardless of choice. (b) Putamen fluctuated with values of the coloured trained branch regardless of choice. (c) VmPFC encoded the value of the chosen option, representing the output of a comparison process.
Figure 4
Figure 4. Functional coupling between caudate-vmPFC and putamen-vmPFC is significantly increased during choice in mixed trials
(a) We tested statistical significance of the PPI interaction contrast between our a priori defined ROIs, for which the effect size is shown as bargraphs (all p < 0.05). Vertical lines = s.e.m. The increase in coupling is independent of actual choice; consistent with the hypothesis that vmPFC mediates in the decision process by accessing pre-choice values from both choice systems. (b) Shown are areas of increased coupling with both caudate and putamen during mixed choices (conjunction analysis).

References

    1. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998.
    1. Samuels AL. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development. 1957;3:210–229.
    1. O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. Temporal difference models and reward-related learning in the human brain. Neuron. 2003;38:329–337. - PubMed
    1. Seymour B, et al. Temporal difference models describe higher-order learning in humans. Nature. 2004;429:664–667. - PubMed
    1. Shallice T. Specific impairments of planning. Philos Trans R Soc Lond B Biol Sci. 1982;298:199–209. - PubMed

Publication types

MeSH terms