Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May;18(5):767-72.
doi: 10.1038/nn.3981. Epub 2015 Mar 23.

Model-based choices involve prospective neural activity

Affiliations

Model-based choices involve prospective neural activity

Bradley B Doll et al. Nat Neurosci. 2015 May.

Abstract

Decisions may arise via 'model-free' repetition of previously reinforced actions or by 'model-based' evaluation, which is widely thought to follow from prospective anticipation of action consequences using a learned map or model. While choices and neural correlates of decision variables sometimes reflect knowledge of their consequences, it remains unclear whether this actually arises from prospective evaluation. Using functional magnetic resonance imaging and a sequential reward-learning task in which paths contained decodable object categories, we found that humans' model-based choices were associated with neural signatures of future paths observed at decision time, suggesting a prospective mechanism for choice. Prospection also covaried with the degree of model-based influences on neural correlates of decision variables and was inversely related to prediction error signals thought to underlie model-free learning. These results dissociate separate mechanisms underlying model-based and model-free evaluation and support the hypothesis that model-based influences on choices and neural decision variables result from prospection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Task design. a. Timeline of events. 272 trials begin on a randomly selected first stage state (faces or tools, left/right presentation randomized). First stage choices deterministically produce second stage choices (body parts or scenes), which probabilistically produce reward. b. State transition structure. Choices in the face state are equivalent to choices in the tool state. Faces and tools here depicted on the left always lead to the body part state, while those depicted on the right always lead to the scene state.
Figure 2
Figure 2
Model behavioral predictions and data. Plots depict the probability of staying with start state choice made on previous trial. Choices binned by whether start state (tools or faces) was the same as or different from the start state on the previous trial, and whether the previous trial ended in reward (Rew) or not (No Rew). a. Model-based predictions. Start states are utilized in terms of the transitions they afford, so outcomes affect subsequent choice regardless of whether the previous start state matches or differs from the current one. b. Model-free predictions. Separate values are learned for the actions at the different start states, so outcomes only affect subsequent choice on trials with the same start state. Panels a and b produced from generative RL model task performance (Online Methods), with weighting parameter w specifying fully model-based (w = 1) and fully model-free learning (w = 0), respectively. c. Observed data indicates the presence of both effects. The effect of previous reward on choice(estimate = 0.47, Z = 4.6, P = 2.4 × 10−5) is greater when the current start state matches the previous one(estimate = 0.48, Z = 2.9, P = 0.0036, Online Methods, Supplementary Table 1). Error bars reflect standard error of the mean.
Figure 3
Figure 3
Neural evidence of prospective activation correlates with model-based behavior. a, b Example subject Regions of interest (ROIs), derived from independent functional localizer, for a. body parts, and b. scenes (see Supplementary Fig. 2 for group level depiction of these ROIs). c. Correlation of averaged ROI BOLD response and model-based relative to model-free choice behavior. Prospective activation estimated at task start stage on catch trials (68 randomly interspersed trials in which no second-stage occurred) as a contrast of chosen second-stage state relative to the unchosen one in the relevant ROI (body parts or scenes, individual scores are averages of the two contrasts). Prospective activation correlates positively with tendency to make model-based choices (estimate = 0.46, χ2 (1) = 4.4, P = 0.036). Lines depict group-level linear effects and 95% confidence curves.
Figure 4
Figure 4
Correlates of trial-trial choice probabilities derived from chosen minus unchosen values estimated by model-free and model-based learning at the task’s first stage. Highlighted regions show negative correlation (i.e., activity increasing in unchosen minus chosen value) for a. model-free choice probabilities and b. the difference between model-based and model-free choice probabilities, the latter difference isolating activity significantly related to model-based rather than model-free learning. Bold response in dorsomedial prefrontal cortex (dmPFC; a, b left; model-free peak: 8 30 34, P = 0; model-based – model-free peak: 12 34 16, P = 3.9 × 10−7) and frontopolar cortex (FPC; a, b right; model-free peak: −40 44 12, P = 2.5 × 10−11 ; model-based – model-free peak: 26 56 6, P = 2.9 × 10−8) correlates negatively with both regressors (Supplementary Tables 2 and 3). Cluster P-values corrected for family-wise error for whole-brain comparisons. Maps thresholded at P< 0.001, uncorrected for display purposes. c. d. Effect size (from b) of model-based minus model-free correlations with unchosen choice probabilities in dmPFC and FPC correlate with the tendency to make model-based choices across subjects. c. dmPFC (estimate = 0.45, χ2 (1) = 3.89, P=0.049) and d. FPC (estimate = 0.55, χ2 (1) = 6.4, P=0.011). Lines depict group-level linear effects and 95% confidence curves.
Figure 5
Figure 5
Neural evidence of model-free prediction errors (PEs), and correlates of PE with model-free behavior.a. Putamen BOLD response correlates with model-free prediction errors that accompany state transitions (peak: −24 8 −4, P=0.0005, cluster corrected for family-wise error for whole-brain comparisons. Supplementary Table 4). b. Effect size of model-free PEs in putamen covaries negatively across subjects with the tendency to make model-based relative to model-free choices. Left panel: peaks 28 10 2; 32 12 0 (P=0.019), −26 12 −8 (P=0.0826). Coordinates and P-values here and in c (left panel) reflect small volume correction for clusters in anatomical mask of striatum. Right panel: correlation estimated from average activity in significant clusters depicted in a, restricted to striatum (estimate = −0.57, χ2 (1) = 6.93, P=0.008). c. Neural measures of model-based prospection and model-free PE negatively correlate. Left panel: Clusters showing across subject negative correlation of model-based prospection and model-free PE. Peaks: −26 2 −4 (P=0.032), 28 10 10 (P = 0.001). Right panel: correlation estimated from average activity in significant striatal clusters depicted in a (r = −0.73, P = 0.0003). Lines in a and c depict group-level linear effects and 95% confidence curves. Maps thresholded at P< 0.001, uncorrected for display purposes.

References

    1. Thorndike EL. Animal intelligence: an experimental study of the associative processes in animals. Psychol. Rev. Monogr. Suppl. 1898;2:1–8.
    1. Sutton RS, Barto AG. Introduction to Reinforcement Learning. Cambridge, MA: MIT Press; 1998. at < http://dl.acm.org/citation.cfm?id=551283>.
    1. Tolman EC. Cognitive maps in rats and men. Psychol. Rev. 1948;55:189–208. - PubMed
    1. Shohamy D, Wagner AD. Integrating memories in the human brain: hippocampal-midbrain encoding of overlapping events. Neuron. 2008;60:378–389. - PMC - PubMed
    1. Wimmer GE, Shohamy D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science. 2012;338:270–273. - PubMed

Publication types

LinkOut - more resources