Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2007 May 29;362(1481):933-42.
doi: 10.1098/rstb.2007.2098.

Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration

Affiliations
Review

Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration

Jonathan D Cohen et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Many large and small decisions we make in our daily lives-which ice cream to choose, what research projects to pursue, which partner to marry-require an exploration of alternatives before committing to and exploiting the benefits of a particular choice. Furthermore, many decisions require re-evaluation, and further exploration of alternatives, in the face of changing needs or circumstances. That is, often our decisions depend on a higher level choice: whether to exploit well known but possibly suboptimal alternatives or to explore risky but potentially more profitable ones. How adaptive agents choose between exploitation and exploration remains an important and open question that has received relatively limited attention in the behavioural and brain sciences. The choice could depend on a number of factors, including the familiarity of the environment, how quickly the environment is likely to change and the relative value of exploiting known sources of reward versus the cost of reducing uncertainty through exploration. There is no known generally optimal solution to the exploration versus exploitation problem, and a solution to the general case may indeed not be possible. However, there have been formal analyses of the optimal policy under constrained circumstances. There have also been specific suggestions of how humans and animals may respond to this problem under particular experimental conditions as well as proposals about the brain mechanisms involved. Here, we provide a brief review of this work, discuss how exploration and exploitation may be mediated in the brain and highlight some promising future directions for research.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Daw et al. (2006) examined how subjects handle the exploration–exploitation problem in a four-armed bandit problem. (a) In each trial of their task, subjects selected one of the four bandits and received a reward based on its current mean pay-off perturbed by noise. (b) The expected value of each bandit changed continuously over time.
Figure 2
Figure 2
Aston-Jones & Cohen (2005) propose that exploration and exploitation may be mediated by separate short- and long-term measures of utility (cost and reward). Exploration and exploitation, in this model, are mediated by the firing mode of norepinephrine neurons in the locus coeruleus (LC).
Figure 3
Figure 3
A neural network model of how reward and cost are integrated in the locus coeruleus to adaptively change between exploration and exploitation, as proposed by McClure et al. (2006). The left side shows a simple network for decision making in the task. The right side shows evaluative and neuromodulatory mechanisms that regulate the decision-making mechanisms. The model proposes that information about cost (calculated by the anterior cingulate cortex (ACC)) and reward (calculated by the ventromedial prefrontal cortex (vmPFC) and orbitofrontal cortex (OFC)) converge on both the ventral tegmental area (VTA) and the locus coeruleus (LC). This information is used by the VTA to implement a reinforcement learning algorithm that adjusts the weights in the decision network. In the LC, evaluative information sets the mode of responding (phasic or tonic), which, through norepinephrine (NE) release and gain modulation of units in the decision network, regulates the balance between exploration and exploitation (see text for more detailed description).

References

    1. Ainslie G. Specious reward: a behavioral theory of impulsiveness and impulse control. Psychol. Bull. 1975;82:463–496. doi:10.1037/h0076860 - DOI - PubMed
    1. Allport A, Styles E, Hsieh S. Shifting intentional set: exploring the dynamic control of task. In: Umilta C, Moscovitch M, editors. Attention and performance XV. MIT Press; Cambridge, MA: 1994. pp. 421–452.
    1. Aston-Jones G, Cohen J.D. An integrative theory of locus coeruleus–norepinephrine function: adaptive gain and optimal performance. Annu. Rev. Neurosci. 2005;28:403–450. doi:10.1146/annurev.neuro.28.061604.135709 - DOI - PubMed
    1. Aston-Jones G, Rajkowski J, Kubiak P, Alexinsky T. Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. J. Neurosci. 1994;14:4467–4480. - PMC - PubMed
    1. Aston-Jones G, Rajkowski J, Kubiak P. Conditioned responses in monkey locus coeruleus neurons anticipate acquisition of discriminative behavior in a vigilance task. Neuroscience. 1997;80:697–715. doi:10.1016/S0306-4522(97)00060-2 - DOI - PubMed

Publication types

LinkOut - more resources