Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration

Jonathan D Cohen¹, Samuel M McClure, Angela J Yu

Affiliations

PMID: 17395573
PMCID: PMC2430007
DOI: 10.1098/rstb.2007.2098

Review

Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration

Jonathan D Cohen et al. Philos Trans R Soc Lond B Biol Sci. 2007.

. 2007 May 29;362(1481):933-42.

doi: 10.1098/rstb.2007.2098.

Authors

Jonathan D Cohen¹, Samuel M McClure, Angela J Yu

Affiliation

¹ Department of Psychology and Center for the Study of Brain, Mind and Behaviour, Princeton University, Princeton, NJ 08540, USA. jdc@princeton.edu

PMID: 17395573
PMCID: PMC2430007
DOI: 10.1098/rstb.2007.2098

Abstract

Many large and small decisions we make in our daily lives-which ice cream to choose, what research projects to pursue, which partner to marry-require an exploration of alternatives before committing to and exploiting the benefits of a particular choice. Furthermore, many decisions require re-evaluation, and further exploration of alternatives, in the face of changing needs or circumstances. That is, often our decisions depend on a higher level choice: whether to exploit well known but possibly suboptimal alternatives or to explore risky but potentially more profitable ones. How adaptive agents choose between exploitation and exploration remains an important and open question that has received relatively limited attention in the behavioural and brain sciences. The choice could depend on a number of factors, including the familiarity of the environment, how quickly the environment is likely to change and the relative value of exploiting known sources of reward versus the cost of reducing uncertainty through exploration. There is no known generally optimal solution to the exploration versus exploitation problem, and a solution to the general case may indeed not be possible. However, there have been formal analyses of the optimal policy under constrained circumstances. There have also been specific suggestions of how humans and animals may respond to this problem under particular experimental conditions as well as proposals about the brain mechanisms involved. Here, we provide a brief review of this work, discuss how exploration and exploitation may be mediated in the brain and highlight some promising future directions for research.

PubMed Disclaimer

Figures

**Figure 1**
Daw *et al*. (2006) examined how subjects handle the exploration–exploitation problem in a four-armed bandit problem. (a) In each trial of their task, subjects selected one of the four bandits and received a reward based on its current mean pay-off perturbed by noise. (b) The expected value of each bandit changed continuously over time.

**Figure 2**
Aston-Jones & Cohen (2005) propose that exploration and exploitation may be mediated by separate short- and long-term measures of utility (cost and reward). Exploration and exploitation, in this model, are mediated by the firing mode of norepinephrine neurons in the locus coeruleus (LC).

**Figure 3**
A neural network model of how reward and cost are integrated in the locus coeruleus to adaptively change between exploration and exploitation, as proposed by McClure *et al*. (2006). The left side shows a simple network for decision making in the task. The right side shows evaluative and neuromodulatory mechanisms that regulate the decision-making mechanisms. The model proposes that information about cost (calculated by the anterior cingulate cortex (ACC)) and reward (calculated by the ventromedial prefrontal cortex (vmPFC) and orbitofrontal cortex (OFC)) converge on both the ventral tegmental area (VTA) and the locus coeruleus (LC). This information is used by the VTA to implement a reinforcement learning algorithm that adjusts the weights in the decision network. In the LC, evaluative information sets the mode of responding (phasic or tonic), which, through norepinephrine (NE) release and gain modulation of units in the decision network, regulates the balance between exploration and exploitation (see text for more detailed description).

See this image and copyright information in PMC

References

1. Ainslie G. Specious reward: a behavioral theory of impulsiveness and impulse control. Psychol. Bull. 1975;82:463–496. doi:10.1037/h0076860 - DOI - PubMed
1. Allport A, Styles E, Hsieh S. Shifting intentional set: exploring the dynamic control of task. In: Umilta C, Moscovitch M, editors. Attention and performance XV. MIT Press; Cambridge, MA: 1994. pp. 421–452.
1. Aston-Jones G, Cohen J.D. An integrative theory of locus coeruleus–norepinephrine function: adaptive gain and optimal performance. Annu. Rev. Neurosci. 2005;28:403–450. doi:10.1146/annurev.neuro.28.061604.135709 - DOI - PubMed
1. Aston-Jones G, Rajkowski J, Kubiak P, Alexinsky T. Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. J. Neurosci. 1994;14:4467–4480. - PMC - PubMed
1. Aston-Jones G, Rajkowski J, Kubiak P. Conditioned responses in monkey locus coeruleus neurons anticipate acquisition of discriminative behavior in a vigilance task. Neuroscience. 1997;80:697–715. doi:10.1016/S0306-4522(97)00060-2 - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration

Affiliation

Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources