Believing in dopamine

Samuel J Gershman¹, Naoshige Uchida²

Affiliations

¹ Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA. gershman@fas.harvard.edu.
² Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA.

PMID: 31570826
PMCID: PMC7472313
DOI: 10.1038/s41583-019-0220-7

Review

Believing in dopamine

Samuel J Gershman et al. Nat Rev Neurosci. 2019 Nov.

. 2019 Nov;20(11):703-714.

doi: 10.1038/s41583-019-0220-7. Epub 2019 Sep 30.

Authors

Samuel J Gershman¹, Naoshige Uchida²

Affiliations

¹ Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA. gershman@fas.harvard.edu.
² Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA.

PMID: 31570826
PMCID: PMC7472313
DOI: 10.1038/s41583-019-0220-7

Abstract

Midbrain dopamine signals are widely thought to report reward prediction errors that drive learning in the basal ganglia. However, dopamine has also been implicated in various probabilistic computations, such as encoding uncertainty and controlling exploration. Here, we show how these different facets of dopamine signalling can be brought together under a common reinforcement learning framework. The key idea is that multiple sources of uncertainty impinge on reinforcement learning computations: uncertainty about the state of the environment, the parameters of the value function and the optimal action policy. Each of these sources plays a distinct role in the prefrontal cortex-basal ganglia circuit for reinforcement learning and is ultimately reflected in dopamine activity. The view that dopamine plays a central role in the encoding and updating of beliefs brings the classical prediction error theory into alignment with more recent theories of Bayesian reinforcement learning.

PubMed Disclaimer

Figures

**Figure 1.. Schematic of the neural architecture for reinforcement learning under state uncertainty.**
Bayesian inference combines noisy sensory data with a prior over latent states to compute the posterior distribution, or belief state, hypothesized to be encoded in the medial prefrontal cortex (PFC). The belief state is mapped into a distributed state representation (basis functions) in the striatum, which is in turn mapped onto a value function. Dopamine drives updating of the value function parameters by reporting a reward prediction error (the difference between observed and expected reward, or value).

**Figure 2.. Experimental evidence for reflections of state uncertainty in dopamine signals.**
(A) Experimental tasks and results from . Mice observed an odor followed by a water reward. Odor A was associated with a variable odor-reward interval, whereas odors B and C were associated with fixed intervals. ISI: interstimulus interval between odor and reward. ITI: intertrial interval. The middle plots show the structure of the task as a probabilistic graphical mode. The bottom plots show the baseline-subtracted firing rates of optogenetically identified dopamine neurons in the ventral tegmental area. (B) Experimental task and results from . Mice observed an odor followed by a water reward whose magnitude varied across blocks. The middle plot shows the normalized calcium response from dopamine neurons in the ventral tegmental area measured using fiber photometry. The bottom plot shows anticipatory licking and the predicted values. Animals were trained using blocks of either small or big reward trials first. In rare trials (probe trials), animals received intermediate-size reward. The x-axis indicates the magnitudes of reward in the probe trials.

**Figure 3.. Experimental evidence for uncertainty-dependent dopamine signals in a perceptual decision making task.**
(A) Experimental task from ^,. Monkeys observe randomly moving dots and then make a judgment about their direction. The proportion of coherently moving dots is manipulated across trials. (B) Predictions from the belief state model. At stimulus onset, reward prediction error (RPE) response increases as a function of coherence on correct trials, but decreases as a function of coherence on error trials. This pattern is inverted at feedback onset. (C) Recordings of dopamine neurons under the same condition as in (B), confirming the theoretical predictions.

**Figure 4.. Two forms of uncertainty have distinct effects on exploratory choice, and are governed by distinct dopamine afferents.**
(A) A two-armed bandit task in which each arm is either “safe” (deterministic) or “risky” (stochastic). (B) Schematic of how different trial types affect the probability of choosing the left option, plotted as a function of the estimated value difference between the options. The left plot illustrates the manipulation of relative uncertainty: when the left option is safe and the right option is risky, the choice probability function is shifted to the right, reflecting a change in choice bias (indifference point) caused by an uncertainty bonus for the risky option. This corresponds to a form of directed exploration, putatively controlled by prefrontal dopamine (DA) levels. Evidence suggests that the magnitude of the uncertainty bonus is controlled by prefrontal dopamine (DA) levels. The right plot illustrates the manipulation of total uncertainty: when the both options are safe, the choice probability function becomes steeper relative to when both options are risky, reflecting a reduction in choice stochasticity. This corresponds to a form of random exploration, putatively controlled by striatal DA levels.

See this image and copyright information in PMC

References

1. Watabe-Uchida M, Eshel N & Uchida N Neural Circuitry of Reward Prediction Error. Annu. Rev. Neurosci 40, 373–394 (2017). - PMC - PubMed
1. Schultz W, Dayan P & Montague PR A neural substrate of prediction and reward. Science 275, 1593–1599 (1997). - PubMed
1. Courville AC, Daw ND & Touretzky DS Bayesian theories of conditioning in a changing world. Trends Cogn. Sci 10, 294–300 (2006). - PubMed
1. Gershman SJ, Blei DM & Niv Y Context, learning, and extinction. Psychol. Rev 117, 197–209 (2010). - PubMed
1. Gershman SJ A Unifying Probabilistic View of Associative Learning. PLoS Comput. Biol 11, e1004567 (2015). - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 NS108740/NS/NINDS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Believing in dopamine

Affiliations

Believing in dopamine

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources