Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jun 15;441(7095):876-9.
doi: 10.1038/nature04766.

Cortical substrates for exploratory decisions in humans

Affiliations

Cortical substrates for exploratory decisions in humans

Nathaniel D Daw et al. Nature. .

Abstract

Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this 'exploration-exploitation' dilemma, a gambler choosing between multiple slot machines balances the desire to select what seems, on the basis of accumulated experience, the richest option, against the desire to choose a less familiar option that might turn out more advantageous (and thereby provide information for improving future decisions). Far from representing idle curiosity, such exploration is often critical for organisms to discover how best to harvest resources such as food and water. In appetitive choice, substantial experimental evidence, underpinned by computational reinforcement learning (RL) theory, indicates that a dopaminergic, striatal and medial prefrontal network mediates learning to exploit. In contrast, although exploration has been well studied from both theoretical and ethological perspectives, its neural substrates are much less clear. Here we show, in a gambling task, that human subjects' choices can be characterized by a computationally well-regarded strategy for addressing the explore/exploit dilemma. Furthermore, using this characterization to classify decisions as exploratory or exploitative, we employ functional magnetic resonance imaging to show that the frontopolar cortex and intraparietal sulcus are preferentially active during exploratory decisions. In contrast, regions of striatum and ventromedial prefrontal cortex exhibit activity characteristic of an involvement in value-based exploitative decision making. The results suggest a model of action selection under uncertainty that involves switching between exploratory and exploitative behavioural modes, and provide a computationally precise characterization of the contribution of key decision-related brain systems to each of these functions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Task design. a, Illustration of the timeline within a trial. Initially, four slots are presented. The subject chooses one, which then spins. Three seconds later the number of points won is revealed. After a further second the screen is cleared. The next trial is triggered after a fixed trial length of 6s and an additional variable inter-trial interval (mean 2 s). b, Example of mean payoffs that would be received for choosing each slot machine (four coloured lines) on each trial, demonstrating their independent random diffusion. The payoff received for a particular choice is corrupted by gaussian noise around this mean.
Figure 2
Figure 2
Reward-related activations. Activation maps (yellow, P < 0.00 1; red, P < 0.01 to illustrate the full extent of the activations) are superimposed on a subject-averaged structural scan. a, Region of medial orbitofrontal cortex (mOFC) correlating significantly with the number of points received. The coordinates of the activated area are [3,30, −21, peak z 1/4 3.87]. The bar plot shows the average BOLD response to outcome, binned by amount won (error bars represent s.e.m.). b, Regions of ventromedial prefrontal cortex (vmPFC; including medial and lateral orbitofrontal cortex and adjacent medial prefrontal cortex) correlating significantly with the probability assigned by the computational model to the subject's choice of slot. The coordinates of the activated areas are as follows: medial orbitofrontal, [−3,45,−18, peak z 1/4 5.62]; lateral orbitofrontal (not illustrated), [45,36,- 15, peakz 1/4 4.6]; medial prefrontal, [−3,33,−6, peak z 1/4 4.62]. The bar plot shows the average medial prefrontal BOLD response to decision, binned by choice probability (error bars represent s.e.m.).
Figure 3
Figure 3
Exploration-related activity in frontopolar cortex. a, Regions of left and right frontopolar cortex (lFP, rFP) showing significantly increased activation on exploratory compared with exploitative trials. Activation maps (yellow, P < 0.001; red, P < 0.01) are superimposed on a subject-averaged structural scan. The coordinates of activated areas are [−27,48,4, peak z = 3.49] for lFP and [27,57,6, peak z = 4.13] for rFP. b, rFP BOLD time courses averaged over 1,515 exploratory (red line) and 2,646 exploitative (blue line) decisions. Black dots indicate the sampling frequency (although, because sample alignment varied from trial to trial, time courses were upsampled). Coloured fringes show error bars (representing s.e.m.).
Figure 4
Figure 4
Exploration-related activity in intraparietal sulcus. a, Regions of left and right intraparietal sulcus (lIPS and rIPS) showing significantly increased activation on exploratory compared with exploitative trials. Activation maps (yellow, P < 0.001; red, P < 0.01) are superimposed on a subject-averaged structural scan. The coordinates of the activated areas are [−29,−33,45, peak z = 4.39] for lIPS and [39,−36,42, peak z = 4.16] for rIPS. b, lIPS BOLD time courses averaged over 1,515 exploratory (red line) and 2,646 exploitative (blue line) decisions. Black dots indicate the sampling frequency (although, because sample alignment varied from trial to trial, time courses were upsampled). Coloured fringes show error bars (representing s.e.m.).

Comment in

References

    1. McClure SM, Berns GS, Montague PR. Temporal prediction errors in a passive learning task activate human striatum. Neuron. 2003;38:339–346. - PubMed
    1. O'Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. Temporal difference models and reward-related learning in the human brain. Neuron. 2003;38:329–337. - PubMed
    1. O'Doherty JP, et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 2004;304:452–454. - PubMed
    1. Charnov EL. Optimal foraging: The marginal value theorem. Theor. Popul. Biol. 1976;9:129–136. - PubMed
    1. Owen AM. Cognitive planning in humans: Neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog. Neurobiol. 1997;53:431–450. - PubMed

Publication types