Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Jul 25:40:373-394.
doi: 10.1146/annurev-neuro-072116-031109. Epub 2017 Apr 24.

Neural Circuitry of Reward Prediction Error

Affiliations
Review

Neural Circuitry of Reward Prediction Error

Mitsuko Watabe-Uchida et al. Annu Rev Neurosci. .

Abstract

Dopamine neurons facilitate learning by calculating reward prediction error, or the difference between expected and actual reward. Despite two decades of research, it remains unclear how dopamine neurons make this calculation. Here we review studies that tackle this problem from a diverse set of approaches, from anatomy to electrophysiology to computational modeling and behavior. Several patterns emerge from this synthesis: that dopamine neurons themselves calculate reward prediction error, rather than inherit it passively from upstream regions; that they combine multiple separate and redundant inputs, which are themselves interconnected in a dense recurrent network; and that despite the complexity of inputs, the output from dopamine neurons is remarkably homogeneous and robust. The more we study this simple arithmetic computation, the knottier it appears to be, suggesting a daunting (but stimulating) path ahead for neuroscience more generally.

Keywords: arithmetic; circuitry; dopamine; learning; prediction error; reward.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Firing patterns of identified dopamine and GABA neurons in VTA.
A. VTA neurons were recorded while mice performed an odor-outcome association task in which different odors predicted different outcomes (see legend on right). Odors were presented for 1 second and outcomes were presented after a 1-second delay. Neuron types were identified based on their optogenetic responses. Dopamine neurons (left, n = 26) showed phasic excitations to reward-predictive cues and reward. GABA neurons (right, n = 20) showed sustained activation during the delay. Data from Cohen et al. (2012). B. Reward expectation modulates dopamine neuron firing. Left, when outcome was presented; Right, when outcome was omitted. Different odors predicted reward with different probabilities. Higher reward probability increased cue responses but suppressed reward responses. Data from Tian and Uchida (2015). Also see Fiorillo et al. (2003) and Matsumoto and Hikosaka (2009). C. Reward context-dependent modulation of dopamine responses to air puff-predictive cues. The task conditions during recording differed only in the probability of reward. Dopamine neurons showed both excitation and inhibition in high-reward contexts (left) but only inhibition in low-reward contexts (right). The response in reward trials (black line) is omitted. Data from Matsumoto et al. (2016).
Figure 2.
Figure 2.. Subtractive computation in dopamine neurons.
A. In one task condition (No odor, black), different amounts of reward were presented without any predictive cue. In another condition (Odor A, orange), the timing of reward was predicted by an odor. B. Prediction. Division should change the slope of the curve, whereas subtraction should cause a downward shift. C. Average response of 40 optogenetically-identified dopamine neurons. Prediction caused a subtractive shift. Data from Eshel et al. (2015). D. Three example neurons. Although individual neurons exhibited diversity with respect to response magnitudes, their response functions were scaled versions of one another. Data from Eshel et al. (2016).
Figure 3.
Figure 3.. Models of RPE computations.
A. Temporal difference (TD) error model as implemented in Schultz et al. (1997). The computation of TD errors, δ=r+Vt+1-V(t), can be seen as combining three inputs, one for each term. The traces show how each term changes as a function of time in a classical conditioning paradigm. The gray trace, Vt+1-V(t), can be seen as the temporal derivative of the value function, V(t). The dopamine response during reward omission can be approximated by Vt+1-V(t) (gray). r: reward. B, C. Alternative models assuming that reward-predictive cues and reward elicit phasic excitation. Reward expectation modulates dopamine reward responses either at the dopamine neuron itself (B) or upstream (C).
Figure 4.
Figure 4.. Monosynaptic input to dopamine neurons.
A. Monosynaptic inputs to VTA and SNc dopamine neurons (blue and red, respectively). Inputs were labeled through transsynaptic retrograde tracing using rabies virus. Data from Watabe-Uchida et al. (2012). B. Schematic summary of A. The thickness of each line indicates the extent of inputs from each area (% of total inputs). C. Firing patterns of monosynaptic inputs in a classical conditioning paradigm. Monosynaptic inputs to dopamine neurons were labeled by channelrhodopsin-2 using rabies virus. Optogenetics were used to identify these inputs in 7 brain areas while mice performed a task. Data from Tian et al. (2016). LO: lateral orbitofrontal cortex; M1, primary motor cortex; M2, secondary motor cortex; S1, primary somatosensory cortex; Tu, olfactory tubercle; Acb, nucleus accumbens; DS, dorsal striatum; VP, ventral pallidum; EA, extended amygdala; BNST, bed nucleus of stria terminalis; IPAC, interstitial nucleus of the posterior limb of the anterior commissure; GP, globus pallidus (external segment of the globus pallidus); EP, entopeduncular nucleus (internal segment of the globus pallidus); MPA, medial preoptic area; LPO, lateral preoptic area; Pa, paraventricular hypothalamic nucleus; DB, diagonal band of Broca; Ce, central amygdala; LH, lateral hypothalamus; ZI, zona incerta; STh, subthalamic nucleus; PSTh, parasubthalamic nucleus; SC, superior colliculus; PPTg, pedunculopontine tegmental nucleus; LDTg, Laterodorsal tegmental nucleus; PAG, Periaqueductal gray; DR, dorsal raphe; mRt, Reticular formation; PB, parabrachial nucleus.

References

    1. Atallah BV, Bruns W, Carandini M, and Scanziani M (2012). Parvalbumin-expressing interneurons linearly transform cortical responses to visual stimuli. Neuron 73, 159–170. - PMC - PubMed
    1. Ayaz A, and Chance FS (2009). Gain Modulation of Neuronal Responses by Subtractive and Divisive Mechanisms of Inhibition. J. Neurophysiol 101, 958–968. - PubMed
    1. Bayer HM, and Glimcher PW (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141. - PMC - PubMed
    1. Bayer HM, Lau B, and Glimcher PW (2007). Statistics of midbrain dopamine neuron spike trains in the awake primate. J. Neurophysiol 98, 1428–1439. - PubMed
    1. Beier KT, Steinberg EE, DeLoach KE, Xie S, Miyamichi K, Schwarz L, Gao XJ, Kremer EJ, Malenka RC, and Luo L (2015). Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping. Cell 162, 622–634. - PMC - PubMed