Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 10;525(7568):243-6.
doi: 10.1038/nature14855. Epub 2015 Aug 31.

Arithmetic and local circuitry underlying dopamine prediction errors

Affiliations

Arithmetic and local circuitry underlying dopamine prediction errors

Neir Eshel et al. Nature. .

Erratum in

Abstract

Dopamine neurons are thought to facilitate learning by comparing actual and expected reward. Despite two decades of investigation, little is known about how this comparison is made. To determine how dopamine neurons calculate prediction error, we combined optogenetic manipulations with extracellular recordings in the ventral tegmental area while mice engaged in classical conditioning. Here we demonstrate, by manipulating the temporal expectation of reward, that dopamine neurons perform subtraction, a computation that is ideal for reinforcement learning but rarely observed in the brain. Furthermore, selectively exciting and inhibiting neighbouring GABA (γ-aminobutyric acid) neurons in the ventral tegmental area reveals that these neurons are a source of subtraction: they inhibit dopamine neurons when reward is expected, causally contributing to prediction-error calculations. Finally, bilaterally stimulating ventral tegmental area GABA neurons dramatically reduces anticipatory licking to conditioned odours, consistent with an important role for these neurons in reinforcement learning. Together, our results uncover the arithmetic and local circuitry underlying dopamine prediction errors.

PubMed Disclaimer

Figures

Extended Data Fig. 1
Extended Data Fig. 1. Recording sites and ArchT expression
a–d, Schematic of recording locations for mice used in the dopamine identification task (a, n = 5), the GABA stimulation task (b, n = 7), the GABA inhibition task (c, n = 9), and the behavioural task (d, n = 12). b, Red, experimental mice expressing ChR2 in VTA GABA neurons (n = 5). Blue, control mice expressing GFP in VTA GABA neurons (n = 2). c, Red, mice in which laser was delivered at continuous intensity (n = 7). Blue, mice in which laser was delivered with ramping intensity (n = 2). d, Red, experimental mice expressing ChR2 in VTA GABA neurons (n = 6). Blue, control mice expressing GFP in VTA GABA neurons (n = 6). e–g, Selectivity and efficiency of ArchT expression. e, Representative merged image (one of 30 Z-stacks). Magenta, Vgat-tdTomato; green, ArchT-GFP. Open arrow, neuron expressing Vgat-tdTomato but not ArchT-GFP. Closed arrow, neuron expressing both Vgat-tdTomato and ArchT-GFP. Scale bar is 10 µm. f, Selectivity of infection to GABA neurons: percentage of ArchT-GFP-expressing neurons (n = 131 neurons for AAV1 and 165 neurons for AAV8) that were positive for Vgat-tdTomato. Filled bars, Vgat-tdTomato mouse injected with AAV1-FLEX-ArchT-GFP. Empty bars, Vgat-tdTomato mouse injected with AAV8-FLEX-ArchT-GFP. g, Efficiency of infection: percentage of Vgat-tdTomato-expressing neurons (n = 278 neurons for AAV1 and 283 neurons for AAV8) that were positive for ArchT-GFP.
Extended Data Fig. 2
Extended Data Fig. 2. Neuron classification for dopamine identification and GABA stimulation experiments
a–c, Dopamine identification experiment. d–f, ChR2-expressing animals in GABA stimulation experiment. g–i, GFP-expressing control animals in GABA stimulation experiment. a, d, g, Responses of all VTA neurons recorded in the tasks. Each row reflects the auROC values for a single neuron in the second before and after delivery of expected reward. Baseline is taken as one second before odour onset. Yellow, increase from baseline; cyan, decrease from baseline. Light-identified neurons are denoted by an * to the left of each column. b, e, h, The first three principal components of the auROC curves. These values were used for unsupervised hierarchical clustering, as shown in the dendrogram on the right. c, f, i, Average firing rates for the three clusters of neurons in each task. Odour was delivered for 1 s, followed by a 0.5 s delay and then reward delivery.
Extended Data Fig. 3
Extended Data Fig. 3. Light identification of dopamine and GABA neurons
a, Raw signal from one example light-identified dopamine neuron. Blue bars, light pulses. b, For the same neuron, mean waveforms for spontaneous (black) and light-evoked (blue) action potentials. c, For the same neuron, raster plots for 20 Hz (left) and 50 Hz (right) laser stimulation. Each row is one trial of laser stimulation. d, Histogram of log P values for each neuron recorded in the dopamine identification experiment (n = 170). The P values were derived from SALT (see Methods). Neurons with P < 0.001 and waveform correlations > 0.9 were considered identified (filled bars). e, f, For light-identified neurons, probability of spiking (e) and latency to first spike (f) after laser pulses at different frequencies. Orange circles, mean across neurons. g, Histogram of mean latencies (left) and latency standard deviations (right) in response to laser stimulation for all light-identified dopamine neurons in the variable-reward task. h–n, Same conventions as a–g, but for neurons recorded in the GABA stimulation task (n = 102).
Extended Data Fig. 4
Extended Data Fig. 4. Individual neuron analysis from all recording experiments
a–c, Results from dopamine identification experiment (Fig. 1). d, e, Results from GABA stimulation experiment (Fig. 2). f–i, Results from GABA inhibition experiment (Fig. 3). a, Raster plots (top and middle) and firing rate (bottom) of representative dopamine neuron in response to unexpected (orange) or expected (black) reward. ***, P < 0.001, t-test. b, For the same neuron, responses (mean ± s.e.m. across trials) to each reward size. Orange line, fit for unexpected reward. Dotted black line, divisive transformation. Solid black line, subtractive transformation. c, Individual neuron regression slopes for the analysis in Fig. 1d. Empty bars, slope not different from zero (P > 0.05). Filled bars, P < 0.05. Triangle, mean slope. d, e, Firing rate of example VTA GABA (d) and putative dopamine (e) neuron with (blue) and without (black) ChR2 stimulation. Light blue box, laser delivery. f, g, Firing rate of example VTA GABA (f) and putative dopamine (g) neuron during odour B trials with (green) or without (black) laser delivery. h, i, Histogram of putative GABA (h) and dopamine (i) neuron responses to laser delivery. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum); empty bars, P > 0.05. Triangle, mean.
Extended Data Fig. 5
Extended Data Fig. 5. VTA GABA activity does not vary consistently with reward size
a–c, Putative GABA neurons in the dopamine identification experiment (Fig. 1). d–f, Putative GABA neurons in the GABA stimulation experiment (Fig. 2). a, b, Average firing rate of putative GABA neurons to unexpected (a) or expected (b) rewards of various sizes. c, Population responses (mean ± s.e.m. across putative GABA neurons) for different reward sizes. Orange, unexpected reward. Black, expected reward. Responses were averaged over a 600 ms window after reward delivery. d, e, Average firing rate of putative GABA neurons to rewards of various sizes, delivered with (e) or without (d) optogenetic GABA stimulation. f, Population responses (mean ± s.e.m. across putative GABA neurons) for different reward sizes. Blue, reward with laser stimulation. Black, reward without laser stimulation. Responses were averaged over a 600 ms window after reward delivery.
Extended Data Fig. 6
Extended Data Fig. 6. Statistical test for subtraction versus division
a, To understand how dopamine neurons compute reward prediction error, we first determined how dopamine neurons respond to various sizes of unexpected reward (schematized as orange curves). We then taught the mice to expect reward and observed how expectation shifted this dose-response (black curves). We modelled four types of shift: output subtraction (top left), input subtraction (bottom left), output division (top right), and input division (bottom right). Output subtraction was consistently the best fit. For equations, see Methods. Analysis adapted from a previous study. b–e, Results from dopamine identification experiment. f-i, Results from GABA stimulation experiment. b, c, Results from all putative dopamine neurons (n = 84). ***, P < 0.001, bootstrap. d, e, Results from light-identified dopamine neurons (n = 40). ***, P < 0.001, bootstrap. f, g, Results from putative dopamine neurons in the GABA stimulation experiment (n = 45). *, P < 0.05, bootstrap. h, i, Results from putative dopamine neurons in the GABA stimulation experiment, subtracting the 500 ms period immediately prior to reward delivery. This takes into account the laser-induced baseline shift in dopamine responses. *, P < 0.05, bootstrap. b, d, f, h, Average responses (mean ± s.e.m. across neurons) to different sizes of reward, with fits for output subtraction (solid line) and output division (dotted line). c, e, g, i, Results of bootstrapping analysis. For each resample, we compared the mean squared error for the subtractive fit with the mean squared error for the divisive fit. Negative numbers favor subtraction. P values were calculated as the proportion of resamples in which division was a better fit than subtraction.
Extended Data Fig. 7
Extended Data Fig. 7. Laser effect is more than a baseline shift
a–d, Results from GABA stimulation experiment. e-h, Results from GABA inhibition experiment. a, Firing rate (mean ± s.e.m.) of putative dopamine neurons that did not show a significant baseline shift. ***, P < 0.001, t-test. b, To visualize whether GABA stimulation preferentially affected phasic dopamine responses in addition to baseline firing rates, we took the activity in Fig. 2c and subtracted the trials when laser was delivered alone. Any remaining change at the time of reward could not be due to a baseline shift. **, P = 0.01, t-test. c, Firing rate (mean ± s.e.m.) of putative dopamine (left) and GABA (right) neurons on trials where laser was delivered in the absence of reward. This dopamine response was subtracted to calculate the firing rates in b. d, Histogram of the phasic effect of GABA stimulation. The values were calculated by subtracting the black line from the blue line in b. Empty bars, slope not different from zero (P > 0.05, Wilcoxon rank-sum). Filled bars, slope different from zero (P < 0.05). Triangle, mean (P < 0.001, t-test). e-h, Same conventions as a–d, but for the GABA inhibition experiment. ***, P < 0.001, t-test.
Extended Data Fig. 8
Extended Data Fig. 8. Behavioural performance on all four experiments
a, In the dopamine identification task (Fig. 1), lick rates (mean ± s.e.m. across sessions) for odours predicting reward (black) or nothing (gray). b, In the GABA stimulation task (Fig. 2), lick rates (mean ± s.e.m. across sessions) for reward alone (black), reward + GABA stimulation (blue), and GABA stimulation alone (orange). c, In the GABA inhibition task (Fig. 3), lick rates (mean ± s.e.m. across sessions) for the odours predicting reward with 90% probability (black) and 10% probability (gray). Green laser was delivered to inhibit VTA GABA neurons on 25% of reward (green) and nothing (orange) trials. d, e, In the bilateral stimulation experiment (Fig. 4), anticipatory licks (mean ± s.e.m. across mice) for mice injected with ChR2 (d) and GFP (e). Gray bars, odour B; blue or yellow bars, odour D. Left, last three training sessions before odour D was paired with laser; Middle, last three sessions with laser delivery (excluding probe trials); Right, last three sessions after laser was turned off. **, P < 0.01; ***, P < 0.001; paired t-test.
Extended Data Fig. 9
Extended Data Fig. 9. Neuron classification for GABA inhibition experiment
a–c, Mice in which laser was delivered with continuous intensity. d–f, Mice in which laser was delivered with ramping intensity. a, d, Responses of all VTA neurons recorded in the tasks. Each row reflects the auROC values for a single neuron in the second before and after delivery of expected reward. Baseline is taken as one second before odour onset. Yellow, increase from baseline; cyan, decrease from baseline. b, e, The first three principal components of the auROC curves. These values were used for unsupervised hierarchical clustering, as shown in the dendrogram on the right. c, f, Average firing rates for the three clusters of neurons in each task. Odour was delivered for 1 s, followed by a 0.5 s delay and then reward delivery.
Extended Data Fig. 10
Extended Data Fig. 10. Ramping laser stimulation eliminates baseline shift
a, Firing rate (mean ± s.e.m.) of putative VTA GABA neurons during odour B trials with (green) or without (black) ramping laser delivery. ***, P < 0.001, t-test. b, Histogram of putative GABA neuron responses to laser delivery. Responses were averaged over the entire duration of the laser. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum); empty bars, P > 0.05. Triangle, mean (P < 0.001, t-test). c, Firing rate (mean ± s.e.m.) of putative dopamine neurons with (green) or without (black) ramping GABA inhibition. ***, P < 0.001, t-test. d, Histogram of putative dopamine neuron responses to laser delivery. Responses were averaged over the 0.5 s window after reward delivery. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum); empty bars, P > 0.05. Triangle, mean (P < 0.001, t-test).
Figure 1
Figure 1. Expectation triggers subtraction of dopamine neuron responses
a, Dopamine identification recording paradigm (left) and task (right). b, Dopamine neuron firing rates (mean ± s.e.m across neurons) for unexpected (orange) or temporally expected (black) reward. ***, P < 0.001, t-test. c, Dopamine neuron responses (mean ± s.e.m.) to different reward sizes. Orange line, fit for unexpected reward. Dotted black line, divisive transformation. Solid black line, subtractive transformation. Subtraction was a better fit (***, P < 0.001, bootstrap; see Methods and Extended Data Fig. 6e). d, Difference between unexpected and expected reward responses (mean ± s.e.m.) as a function of reward size.
Figure 2
Figure 2. Selective excitation of VTA GABA neurons mimics the effect of expectation
a, GABA stimulation recording paradigm (left) and task (right). b, Firing rate (mean ± s.e.m) of putative VTA GABA neurons with (blue) and without (black) ChR2 stimulation. Light blue box, laser delivery. c, Firing rate (mean ± s.e.m) of putative dopamine neurons. ***, P < 0.001, t-test. d, Dopamine neuron responses (mean ± s.e.m.) to different reward sizes. Black line, fit for unexpected reward. Dotted blue line, divisive transformation. Solid blue line, subtractive transformation. Subtraction was a better fit (*, P < 0.05, bootstrap; see Extended Data Fig. 6g). e, f, Same as c and d except in GFP-expressing control animals.
Figure 3
Figure 3. Selective inhibition of VTA GABA neurons modulates prediction errors
a, GABA inhibition recording paradigm (left) and task (right). b, Firing rate (mean ± s.e.m) of putative VTA GABA neurons during odour B trials with (green) or without (black) laser delivery. ***, P < 0.001, paired t-test. c, Firing rate (mean ± s.e.m) of putative dopamine neurons when reward was delivered after odour A (orange) or odour B (black). ***, P < 0.001, paired t-test. d, Same as b except for putative dopamine neurons. ***, P < 0.001, paired t-test.
Figure 4
Figure 4. Bilateral excitation of VTA GABA neurons disrupts learned association
a, Schematic of optogenetic paradigm (left) and behavioural task (right). b, For a representative mouse (one of six mice injected with ChR2), anticipatory licks during each session (mean ± s.e.m. across trials) for odours A (black), B (dark grey), C (light grey), and D (blue). For sessions 12–17 (pale yellow), odour D was paired with laser. ***, P < 0.001, laser x odour interaction, mixed effects model. c, Ratio of anticipatory licks for odour D vs. odour B during laser sessions. Circles, mice injected with ChR2 (blue) or GFP (yellow). Open circles, probe trials, where laser was omitted after odour D. *, P < 0.05; ***, P < 0.001; Wilcoxon rank-sum.

Comment in

Similar articles

Cited by

References

    1. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275:1593–1599. - PubMed
    1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. - PMC - PubMed
    1. Bush RR, Mosteller F. A mathematical model for simple learning. Psychol Rev. 1951;58:313–323. - PubMed
    1. Rescorla RA, Wagner AR. In: Classical conditioning II: current research and theory. Black A, Prokasy W, editors. 1972. pp. 64–99.
    1. Carandini M, Heeger DJ. Normalization as a canonical neural computation. Nat. Rev. Neurosci. 2012;13:51–62. - PMC - PubMed

Additional references

    1. Bäckman CM, et al. Characterization of a mouse strain expressing Cre recombinase from the 3’ untranslated region of the dopamine transporter locus. Genesis. 2006;44:383–390. - PubMed
    1. Vong L, et al. Leptin action on GABAergic neurons prevents obesity and reduces inhibitory tone to POMC neurons. Neuron. 2011;71:142–154. - PMC - PubMed
    1. Boyden ES, Zhang F, Bamberg E, Nagel G, Deisseroth K. Millisecond-timescale, genetically targeted optical control of neural activity. Nature Neuroscience. 2005;8:1263–1268. - PubMed
    1. Atasoy D, Aponte Y, Su HH, Sternson SM. A FLEX switch targets Channelrhodopsin-2 to multiple cell types for imaging and long-range circuit mapping. J. Neurosci. 2008;28:7025–7030. - PMC - PubMed
    1. Uchida N, Mainen ZF. Speed and accuracy of olfactory discrimination in the rat. Nat Neurosci. 2003;6:1224–1229. - PubMed

Publication types