Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May;22(5):797-808.
doi: 10.1038/s41593-019-0375-6. Epub 2019 Apr 15.

The macaque anterior cingulate cortex translates counterfactual choice value into actual behavioral change

Affiliations

The macaque anterior cingulate cortex translates counterfactual choice value into actual behavioral change

Elsa F Fouragnan et al. Nat Neurosci. 2019 May.

Abstract

The neural mechanisms mediating sensory-guided decision-making have received considerable attention, but animals often pursue behaviors for which there is currently no sensory evidence. Such behaviors are guided by internal representations of choice values that have to be maintained even when these choices are unavailable. We investigated how four macaque monkeys maintained representations of the value of counterfactual choices-choices that could not be taken at the current moment but which could be taken in the future. Using functional magnetic resonance imaging, we found two different patterns of activity co-varying with values of counterfactual choices in a circuit spanning the hippocampus, the anterior lateral prefrontal cortex and the anterior cingulate cortex. Anterior cingulate cortex activity also reflected whether the internal value representations would be translated into actual behavioral change. To establish the causal importance of the anterior cingulate cortex for this translation process, we used a novel technique, transcranial focused ultrasound stimulation, to reversibly disrupt anterior cingulate cortex activity.

PubMed Disclaimer

Conflict of interest statement

Competing Financial Interests Statement

The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1. Schematic view of the task, behavioural results and hypothesized neural schemes.
(a) On each trial, animals could choose between two symbols presented on the screen and had to keep in mind a third option, unavailable to them. The position of each symbol on the left/right part of the screen and the combination of available/unavailable options was fully and pseudo-randomized respectively. (b) Each trial began with a random delay followed by the presentation of two abstract symbols for a period ending when the animals made a choice. During this time, monkeys pressed one of two touch-sensors to indicate, which of the two symbols (right or left) they believed was more likely to lead to a reward. Finally, the decision outcome was revealed for 1.5 sec. The selected symbol was kept on the screen (or not) to inform the monkeys of a reward delivery (or no reward). (c) The plots show the probability of receiving a reward for choosing either options 1 (pink), 2 (blue), or 3 (red) on each trial in the 200-trial sessions. (d) The top graphs show the proportion of correct choices (selecting the option with the highest reward probability) plotted as a function of difficulty (distance between the better high value [HV] and the worse low value [LV] presented options: left panel) and context value (sum of both HV’s and LV’s expected values: right panel). Decision accuracy improved with higher value difference between available options and higher total value. The bottom graphs show log-transformed mean RTs for each session plotted as a function of difficulty and context. LogRTs decreased for easier decisions and higher trial value. Red lines are linear fits to the data and the grey lines are the 95% confidence interval, n=25 sessions. (e) Because each of the three options’ values were uncorrelated with one another it was possible to look for neural activity according to three main coding schemes. If activity in a brain area covaries only with the value of the unavailable option then this suggests the area is concerned with representing the value of an option held in memory on the current trial and which should not interfere with decisions taken on the current trial. (f) If instead activity covaries with the ranked value of both the unchosen available option and the option held in memory then it reflects the value of any currently counterfactual choice that might be taken in the future. It is important, however, to distinguish such a pattern from a third possibility (g) in which neural activity is only reflecting the currently available options without representing the counterfactual or unavailable option. Thus, the activity would be negatively related to the HV available option value and positively related to the LV option value. This third pattern indicates that the brain area’s activity reflects the difficulty or uncertainty of the current decision because the difficulty of selecting an option becomes harder as the LV option increases and as the HV option decreases but it is unaffected by the value of the choice that cannot currently be taken (see discussion by Kolling and colleagues). Note that we also analysed a fourth pattern representing the value of each option separately on supplementary figure 3.
Figure 2
Figure 2. Future switches are explained by the expected value associated with counterfactual options.
(a) Estimated expected values associated with the unavailable option on the current trial predict whether animals switch to it when it reappears on the screen on subsequent trials (y-axis: probability of switching to the currently unavailable option. x-axis: reward probability associated with the unavailable option estimated from the Maintain model). Each bin contains 20% of averaged data across trials (individual sessions in grey dots; average across sessions in red dot). (b) A logistic regression confirms that accuracy is explained by the currently unavailable option’s value (higher accuracy for trials in which it is the best of the three options vs. when it is not), in addition to the value of the future chosen and unchosen options (each session’s beta coefficient is represented as a grey dot and the mean beta coefficients is represented as a coloured dot). (c) A similar analysis to the one shown in panel (a) is performed but on the basis of a new coding scheme where the counterfactual options (current unchosen option and current unavailable option) are ranked according to their associated reward probabilities as the better and the worse counterfactual choices. (d) A logistic regression confirms that the value of the better counterfactual option significantly influenced the frequency with which monkeys subsequently switched to it but this was not the case for the worse counterfactual option. One sample t-tests were used across session on the resulting beta coefficients, n=25, for all analyses.
Figure 3
Figure 3. Unavailable option value signal in hippocampus favors accurate future planning.
(a) A whole-brain analysis tested for voxels where activity correlated with the trial-by-trial estimates of the unavailable option binned according to successful future selection. The fMRI analysis was time-locked to the decision phase on trial t and binned according to accurate vs. inaccurate selection of the unavailable option on trial t+1 (in light pink: cluster-corrected, Z > 3.1, P < 0.001; in red: uncorrected, n=25 sessions) (b) ROI analyses (multiple regression analysis on the BOLD signal of the ROI) of the right (top panels) and left (bottom panels) hippocampus illustrate the time course of the aforementioned contrast. BOLD fluctuations reflect the value of the unavailable option on the current trial when it is accurately versus inaccurately selected on the next trial (left panels illustrate the contrast show in (a)). A leave-one-out procedure (for spatial and temporal peak selections) to assess statistical significance revealed that a similar activity change occurs when contrasting the value of the unavailable option for accurate versus inaccurate future rejections of the unavailable option (right panels). SEM are presented in the red shaded area across sessions, n=25. (c) In the left hippocampus, the beta weights for the contrast used in (a) and illustrated in (b, left panel) were predictive of how much the unavailable option’s reward probability influenced animals’ future choice accuracy (top panel) but this was not true for current choice accuracy (bottom panel). Scatter plot at the time of the peak effect, n=25 sessions, Pearson R is reported (Results are normalised).
Figure 4
Figure 4. The anterior cingulate ranks expected reward probabilities of counterfactual options.
(a) Whole-brain analysis shows a significant negative relationship between BOLD activity and the difference between the expected value associated with the currently chosen and unchosen options in a distributed brain network, including ACC, bilateral lPFC, and vmPFC/mOFC (cluster corrected, |Z| > 3.1, P < 0.001, n=25 sessions) (b) ROI analysis of the ACC illustrates the relationship between BOLD and the fully parametric representation of the currently chosen, unchosen, and unavailable options (left panel) and shows that a distinct model in which the counterfactual options are ranked according to their associated reward probabilities explains the data better. Note that we avoid double dipping in favour of the hypothesis that we want to support (hypothesis 2) since the ROI has been defined on the basis of hypothesis 1. All shaded areas represent SEM across sessions, n=25. For hypothesis 2, the grey shading represent the Better (dark grey) and Worse (light grey) alternatives. See supplementary figure 3 for a full Bayesian Model Selection across all hypotheses. (c) The parametric representation of the better and worse counterfactual values in ACC was further explained by whether a future switch in behavior will occur as opposed to the continued maintenance of behavior (“stay”) (leave-one-out procedures for peak selection on time series analyses: top panel). This was not true in the lPFC (bottom panel). Each session is represented as a grey dot (bar represents the average beta coefficient across sessions, n=25, one sample t-tests are performed).
Figure 5
Figure 5. Transcranial Focused Ultrasound Stimulation (TUS) of ACC had a profound and selective effect on resting state connectivity.
(a) Whole-brain functional connectivity between the ACC and the rest of the brain. Left and right top panels show activity coupling between ACC (far-right ROI, black circle) and the rest of the brain in the no stimulation sham condition in two exemplar animals. After ACC TUS in exemplar animal 1, there are strong changes in connectivity (right bottom panel), reflected in changes in a connectivity analysis seeded from ACC with 13 other regions (ROI represented in black circle, for the full details, see supplementary fig.4; table 1) (within subject: two sample t-tests: Cohen’s d=-0.84; t12=-3.03; P=0.01, Cohen’s d=-1.01; t12=-3.65; P=0.003, n=13 ROIs, between-subject control: non-significant, n=6 ROIs). (b) However, while ACC TUS affected ACC connectivity, the effect was selective; ACC TUS did not affect connectivity seeded from lPFC (n.s: non-significant). (c) Running average choice frequency for the three options in the control/sham ACC (left) and the TUS ACC condition (middle) across sessions (the shaded areas represent SEM across session, n= 18 sessions for each group). Predetermined reward schedules used in the sham and in the TUS ACC task for three options, similar to the task used in the fMRI experiment (right). (d) The rate of choosing option 1 was significantly reduced on trials that followed those on which it had previously been a counterfactual option – on trials on which it was unavailable in TUS session compared to SHAM sessions, n = 18 sessions for each group. (e) Decision accuracy is plotted as a function of the difficulty of the decision – the difference between the objective values of the HV and LV options. Values of HV and LV are objective values (reward probability over the last 10 trials). Each bin contains data binned according to percentile, with each point corresponding to the [0-20%], [20-40%], [40-60%], [60-80%], [80%-100%] of the value difference amplitudes. Accuracy is the rate at which the participant picked the objectively better option. Supplementary fig.5d illustrates accuracy as a function of subjective value differences. Performance differences between TUS and sham conditions do not increase with difficulty (small HV-LV differences on the left); if anything the opposite is true. (f) The influence of the better counterfactual option value on future switching behavior (in blue, as per fig.2f) was significantly reduced after TUS ACC (in green), n=18 sessions for each group. (g) While entropy (summed entropy of reward probability for all options) is strongly and negatively predictive of a change in exploratory behavior in the sham condition (indexed by the cumulative number of “stay” choices: choices of the same option on one trial after another), this relationship is disrupted in the TUS ACC condition. Each point in the figure illustrates a running average analysis, where each bin contains the derivative of entropy over five trials (thus 30 points). The small panel on the right depicts the difference in regression coefficients – linear fit – between the TUS ACC and the sham conditions (Animals 1 [S1] and 2 [S2] are individually represented as red diamond and yellow square, respectively in all plots, n=9 sessions per animal).
Figure 6
Figure 6. Contextual modulation of value-guided choice.
(a) Average choice behavior when choosing between the Left and Right options plotted as a function of the value of the unavailable option (low: green; high: yellow). Decisions were less accurate when they were made in the context of a low value unavailable option. Curves plot logistic functions fit to the choice data, n=25 sessions. (b) ROI analysis of the vmPFC/mOFC (left panel: ROI sphere) illustrates the relationship between the BOLD value-comparison signal and the expected value associated with the unavailable option (binned in Low/Mid/High) (right panel). The greater the value of the unavailable option, the more negative the value difference; a more negative pattern is normally associated with decisions that are easier to take (see panel d). Data for individual animals are indicated by red dots (±SEM in grey, n=4 animals). (c) A partial regression plot shows the uncontaminated effect of the unavailable option’s value on accuracy (y-axis: accuracy residuals; x-axis: residuals of the unavailable option’s value). Each bin contains 20% of averaged data across sessions (±SEM). One sample t-test on betas of regression analysis, n=25 sessions. (d) ROI time course analysis of the vmPFC/mOFC illustrates the relationship between BOLD and the fully parametric representation of the currently chosen and unavailable options. The shaded areas represent SEM across session, n=25 sessions. (e) While there was not a main effect of the unavailable option value, vmPFC/mOFC variation in activity related to the currently unavailable option’s value explains between-session variation in the currently unavailable option’s influence on decision making. Scatter plot at the time of the peak effect of unavailable option value in the vmPFC/mOFC (leave-one-out peak selection, n=25 sessions, Pearson R is reported).
Figure 7
Figure 7. Schematic view of brain regions hypothesized to encode counterfactual choice.
Schematic view of some of the brain regions hypothesized to be involved in encoding counterfactual choice (in yellow and dashed lines, including the anterior cingulate cortex - ACC, lateral prefrontal cortex - lPFC, and the hippocampus - Hippo), and choice updating and selection (in red and continuous lines, including the lateral and the medial orbitofrontal cortex – lOFC and mOFC/vmPFC, respectively). A blue line represents the hypothesized effect exerted by the hippocampus, via mOFC/vmPFC, on the current choice.

References

    1. Noser R, Byrne RW. Mental maps in chacma baboons: using inter-group encounters as a natural experiment. Anim Cogn. 2007;10:331–340. - PubMed
    1. Shadlen MN, Shohamy D. Decision Making and Sequential Sampling from Memory. Neuron. 2016;90:927–939. - PMC - PubMed
    1. Boorman ED, Behrens TE, Rushworth MF. Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex. PLOS Biol. 2011;9:e1001093. - PMC - PubMed
    1. Boorman ED, Behrens TEJ, Woolrich MW, Rushworth MFS. How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action. Neuron. 2009;62:733–743. - PubMed
    1. Scholl J, et al. The Good, the Bad, and the Irrelevant: Neural Mechanisms of Learning Real and Hypothetical Rewards and Effort. J Neurosci. 2015;35:11233–11251. - PMC - PubMed

Publication types