The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans

Alan N Hampton¹, Peter Bossaerts, John P O'Doherty

Affiliations

PMID: 16899731
PMCID: PMC6673813
DOI: 10.1523/JNEUROSCI.1010-06.2006

Randomized Controlled Trial

The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans

Alan N Hampton et al. J Neurosci. 2006.

. 2006 Aug 9;26(32):8360-7.

doi: 10.1523/JNEUROSCI.1010-06.2006.

Authors

Alan N Hampton¹, Peter Bossaerts, John P O'Doherty

Affiliation

¹ Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125, USA.

PMID: 16899731
PMCID: PMC6673813
DOI: 10.1523/JNEUROSCI.1010-06.2006

Abstract

Many real-life decision-making problems incorporate higher-order structure, involving interdependencies between different stimuli, actions, and subsequent rewards. It is not known whether brain regions implicated in decision making, such as the ventromedial prefrontal cortex (vmPFC), use a stored model of the task structure to guide choice (model-based decision making) or merely learn action or state values without assuming higher-order structure as in standard reinforcement learning. To discriminate between these possibilities, we scanned human subjects with functional magnetic resonance imaging while they performed a simple decision-making task with higher-order structure, probabilistic reversal learning. We found that neural activity in a key decision-making region, the vmPFC, was more consistent with a computational model that exploits higher-order structure than with simple reinforcement learning. These results suggest that brain regions, such as the vmPFC, use an abstract model of task structure to guide behavioral choice, computations that may underlie the human capacity for complex social interactions and abstract strategizing.

PubMed Disclaimer

Figures

**Figure 1.**
Reversal task setup and state-based decision model. A, Subjects choose one of two fractals that on each trial are randomly placed to the left or right of the fixation cross. Once the subject selects a stimulus, it increases in brightness and remains on the screen until 2 s after the choice. After an additional 3 s, a reward (winning 25 cents, depicted by a quarter dollar coin) or punishment (losing 25 cents, depicted by a quarter dollar coin covered by a red cross) is delivered, with the total money earned displayed at the top of the screen. One stimulus is designated the correct stimulus, and the choice of that stimulus leads to a monetary reward on 70% of occasions and a monetary loss 30% of the time. Consequently, choice of this correct stimulus leads to accumulating monetary gain. The other stimulus is incorrect, and choosing that stimulus leads to a reward 40% of the time and a punishment 60% of the time, leading to a cumulative monetary loss. After subjects choose the correct stimulus on four consecutive occasions, the contingencies reverse with a probability of 0.25 on each successive trial. Subjects have to infer the reversal took place and switch their choice, and at that point the process is repeated. B, We constructed an abstract-state-based model that incorporates the structure of the reversal task in the form of a Bayesian HMM that uses previous choice and reward history to infer the probability of being in the correct/incorrect choice state. The choice state changes (“transits”) from one period to another depending on (1) the exogenously given chance that the options are reversed (the good action becomes the bad one, and vice versa) and (2) the control (if the subject switches when the actual, but hidden, choice state is correct, then the choice state becomes incorrect, and vice versa). Y, Observed reward/punishment; S, observed switch/stay action; X, abstract correct/incorrect choice state that is inferred at each time step (see Materials and Methods). The arrows indicate the causal relationships among random variables. C, Observed choice frequencies that subjects switch (black) or stay (gray) against the inferred posterior probability of the state-based model that their last choice was incorrect. The higher the posterior incorrect probability, the more likely subjects switch (relative choice frequencies are calculated separately for each posterior probability bin).

**Figure 2.**
Correct choice prior and posterior–prior update signals in the brain. A, Brain regions showing a significant correlation with the prior correct signal from the state-based decision model (time-locked to the time of choice). Strong correlations with prior correct were found in the vmPFC (mPFC: 6, 57, −6 mm; z = 5.33; OFC: 0, 33, −24 mm; z = 4.04) as well as in the posterior dorsal amygdala (extending into the anterior hippocampus). The activations are shown superimposed on a subject-averaged structural scan, and the threshold is set at p < 0.001. L, Left. B, Brain regions correlating with the posterior–prior update signal. This is a form of prediction error signal that reflects the difference in value between the prior probability that the choice will be correct and the posterior probability that the choice was correct after receipt of the outcome (a reward or punishment). This signal is significantly correlated with activity in the bilateral ventral striatum (−24, 3, −9 mm; z = 4.64; 18, 3, −15 mm; z = 4.48), dorsomedial PFC (−6, 54, +24 mm; z = 3.54), and vmPFC (−12, 51, −15 mm; z = 3.72). These fMRI contrasts are from group random-effects analyses. C, The relationship between fMRI responses in the mPFC (A, yellow circle) at the time of choice and the prior correct signal from the state-based model showed a strong colinearity, supporting the idea of an optimal inference of state probabilities. To plot this activity against the prior probabilities, we sorted trials into one of five bins to capture different ranges in the prior probabilities and fitted each bin separately to the fMRI data. D, The time course for the averaged percentage of signal change in this same region (mPFC) is shown separately for trials with a high prior correct signal (p > 0.65) and low prior correct signal (p < 0.5). Error bars depict the SEM across all trials of that type. Trials are also separated according to whether a reward (Rew) or a punishment (Pun) was received at the time of outcome to illustrate updating of the signal after feedback. The leftmost shaded area indicates the period (1 s in length) in which subjects made their choice, and the second shaded area indicates the period in which subjects were presented with their rewarding or punishing feedback. Trials with a low prior in which a reward is obtained show an increase in signal at the time of the outcome (with the peak BOLD activity lagged by 4–6 s; see Materials and Methods), whereas trials with a high prior in which a punishment is obtained result in a decrease in signal at outcome, consistent with the possibility that the response at the time of outcome reflects an update signal. Error bars indicate SEM.

**Figure 3.**
Standard RL and abstract-state-based decision models make qualitatively different predictions about the brain activity after subjects switch their choice. A, Both models predict whether a decision is made to stay after being punished, the next action will have a lower expected value in the next trial (blue line). However, if a decision is made to switch choice of stimulus after being punished, simple RL predicts that the expected value of the new choice will also be low (red line; left) because its value was not updated since the last time it was chosen. In contrast, a state-based decision model predicts that the expected value of the new choice will be high. If the subjects have determined that their choice until now was incorrect (prompted by the last punishment), then their new choice after switching is now correct and has a high expected value (red line; middle). Mean fMRI signal changes (time-locked to the time of choice) in the mPFC (derived from a 3 mm sphere centered at the peak voxel) plotted before and after reversal (right) show that activity in this region is more consistent with the predictions of state-based decision making than that of standard RL. This indicates that the expected reward signal in the mPFC incorporates the structure of the reversal task. B, Direct comparison of brain regions correlating with the prior correct signal from the state-based model compared with the equivalent value signal (of the current choice) from the simple RL model. A contrast between the models revealed that the state-based decision model accounts significantly better for neural activity in the mPFC (6, 45, −9 mm; z = 3.68). L, Left.

**Figure 4.**
Incorrect choice prior and switch–stay signals in the brain. A, Brain regions showing a significant correlation with the prior incorrect signal from the state-based algorithm (time-locked to the time of choice). Significant effects were found in the rDLPFC (39, 36, 33 mm; z = 4.30), the anterior cingulate cortex (6, 21, 45 mm; z = 3.37), and the right anterior insula (48, 15, 9 mm; z = 3.96). The threshold was set at p < 0.001. B, Plot showing relationship between fMRI responses in the dorsolateral PFC at the time of choice and the prior incorrect signal from the Bayesian model, illustrating strong colinearity between this signal and activity in this region. C, Brain regions responding to trials in which subjects decide to switch compared with when they do not switch (stay) their choice of stimulus. Significant effects were found in the anterior cingulate cortex (−3, 24, 30 mm; z = 4.54) and the anterior insula (−39, 18, −12 mm; z = 4.26; 51, 21, 3 mm; z = 4.23) bilaterally. The fact that the anterior cingulate and anterior insula respond on these switch trials, as well as responding to the prior incorrect signals, suggest that the decision to switch may be implemented in these regions. Error bars indicate SEM. L, Left.

See this image and copyright information in PMC

Comment in

Simple reinforcement learning models are not always appropriate.
Sohn H, Kim S. Sohn H, et al. J Neurosci. 2006 Nov 8;26(45):11511-2. doi: 10.1523/jneurosci.3973-06.2006. J Neurosci. 2006. PMID: 17106946 Free PMC article. Review. No abstract available.

References

1. Bechara A, Tranel D, Damasio H (2000). Characterization of the decision-making deficit of patients with ventromedial prefrontal cortex lesions. Brain 123:2189–2202. - PubMed
1. Bush G, Vogt BA, Holmes J, Dale AM, Greve D, Jenike MA, Rosen BR (2002). Dorsal anterior cingulate cortex: a role in reward-based decision making. Proc Natl Acad Sci USA 99:523–528. - PMC - PubMed
1. Camerer CF (2003). Strategizing in the brain. Science 300:1673–1675. - PubMed
1. Cools R, Clark L, Owen AM, Robbins TW (2002). Defining the neural mechanisms of probabilistic reversal learning using event-related functional magnetic resonance imaging. J Neurosci 22:4563–4567. - PMC - PubMed
1. Daw ND, Niv Y, Dayan P (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8:1704–1711. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans

Affiliation

The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans

Authors

Affiliation

Abstract

Figures

Comment in

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources