Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 7;29(1):93-103.e3.
doi: 10.1016/j.cub.2018.11.050. Epub 2018 Dec 20.

Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions

Affiliations

Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions

Ronald Keiflin et al. Curr Biol. .

Abstract

Dopamine (DA) neurons in the ventral tegmental area (VTA) and substantia nigra (SNc) encode reward prediction errors (RPEs) and are proposed to mediate error-driven learning. However, the learning strategy engaged by DA-RPEs remains controversial. RPEs might imbue predictive cues with pure value, independently of representations of their associated outcome. Alternatively, RPEs might promote learning about the sensory features (the identity) of the rewarding outcome. Here, we show that, although both VTA and SNc DA neuron activation reinforces instrumental responding, only VTA DA neuron activation during consumption of expected sucrose reward restores error-driven learning and promotes formation of a new cue→sucrose association. Critically, expression of VTA DA-dependent Pavlovian associations is abolished following sucrose devaluation, a signature of identity-based learning. These findings reveal that activation of VTA- or SNc-DA neurons engages largely dissociable learning processes with VTA-DA neurons capable of participating in outcome-specific predictive learning, and the role of SNc-DA neurons appears limited to reinforcement of instrumental responses.

Keywords: blocking; conditioning; dopamine; learning; model-based; model-free; optogenetics; reward-prediction error; substantia nigra; ventral tegmental area.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Behavioral task and histology
(A) Three groups of rats were trained in the blocking/unblocking task. During the Individual Cue phase, two visual cues (A and B) were paired with sucrose reward. In the Compound Cue phase, two new trial types of simultaneous presentation of a visual cue with an auditory cue (X or Y), resulting in two compound stimuli (AX and BY) were introduced. The absence of RPE during compound AX is predicted to block learning about cue X. During compound BY, an RPE was produced by increasing reward magnitude (Reward Upshift group) or by photostimulating DA neurons during sucrose consumption (VTA-DA Stim. and SNc-DA Stim. groups). A 1-day probe test assessed the associative strength acquired by each individual cue. (B) Reconstruction of ChR2-YFP expression and fiber placement in VTA (left) and SNc (right). Light and dark shading indicate maximal and minimal spread of ChR2-YFP, respectively. Square symbols mark ventral extremity of fiber implants. (C) Representative ChR2-YFP expression in VTA (left) or SNc (right). (D) Laser power from the fiber tip estimated from [31]. Full laser power = 120 mW/mm2 (corresponds to 34mW at the tip of 300um fibers; http://www.optogenetics.org/calc)
Figure 2.
Figure 2.. Performance during Individual Cue and Compound Cue training.
(A-C) Time spent in reward port during cue presentation over 10 days of Individual Cue conditioning and 4 days of Compound Cue conditioning for Reward Upshift (A) VTA-DA stimulation (B) and SNc-DA stimulation (C) groups. Values include only the first 9-s after cue onset and prior to sucrose delivery to avoid contamination with the consumption period. Inserts depict average performance over 4 days of Compound Cue conditioning. For all groups, introduction of the auditory stimulus increased performance (A vs. AX, and B vs.BY, all Ps<0.001, Bonferroni-corrected paired t-tests), but there was no difference in responding between the compound cues (AX vs. BY, Ps>0.967, Bonferroni-corrected paired t-tests). (D-F) Probability of presence in port throughout cue presentation during last 4 days of Individual Cue (upper graphs) and 4 days of Compound Cue conditioning (lower graphs), for Reward Upshift (D), VTA-DA stimulation (E), and SNc-DA stimulation (F) groups. Note that photostimulation during compound cue BY did not disrupt ongoing behavior. See also Figure S1.
Figure 3.
Figure 3.. Photoactivation of VTA-DA but not SNc-DA neurons mimics endogenous RPEs and unblocks learning.
Conditioned responding was measured by time spent in the reward port during cue presentation. (A-C): Whole session performance in Reward Upshift (A), VTA-DA stimulation (B), and SNc-DA stimulation (C) groups. Scatterplot inserts show individual data distributions for responding to A and B (top inserts) and X and Y (bottom insert). Histograms along the diagonal are frequency distributions (subject counts) for the difference scores (A - B, or X - Y); off-centered distributions reveal higher responding to one of the cues. (D-F). Trial-by-trial test performance after Reward Upshift (D), VTA-DA stimulation (E), and SNc-DA stimulation (F). A 3-way mixed ANOVA (Group x Cue x Trial) analyzed the evolution of responding over the session and found an interaction between all factors (F30,855=2.603, P<0.001, after Greenhouse-Geisser correction). (G-I) Second-by-second tracking of presence in port during first presentation of each cue (A, B: upper graph; X, Y: lower graph) for Reward Upshift (G), VTA-DA stimulation (H), and SNc-DA stimulation (I) groups. *P<0.05 (A vs. B, or X vs. Y; Post-hoc Bonferroni-corrected t-test). Error bars = s.e.m. See also Figures S1-S3
Figure 4.
Figure 4.. Photoactivation of VTA-DA or SNc-DA neurons serves as an equally potent reinforcer of ICSS behavior.
(A) Rats could respond on one of two nosepokes to obtain optical stimulation of VTA- or SNc-DA neurons. (B) Responses at active and inactive nosepokes during daily 1-h sessions. (C) Cumulative active nosepoke responses during the last ICSS session. *P<0.05, Active vs. Inactive Nosepoke; #P<0.05, Session 1 vs. Session 2 (active nosepoke). Error bar and error bands = s.e.m.
Figure 5.
Figure 5.. Devaluation of the sucrose outcome abolishes conditioned responding to the unblocked cue Y in Reward Upshift and VTA-DA groups.
Learning about target cue Y was unblocked by reward upshift (top graphs) or activation of VTA-DA neurons (bottom graphs). Following unblocking, sucrose was devalued for half of the subjects in Reward Upshift and VTA-DA groups by pairing sucrose consumption with LiCl (Devalued condition). The remaining subjects were exposed to sucrose or LiCl-induced illness on alternate days, preserving the value of sucrose (Valued condition). Conditioned responding to Y (unblocked cue) and A (cue paired with large reward) was assessed at Test. (A, B) Time spent in reward port during cue presentation in Reward Upshift (A) and VTA-DA (B) groups. Sucrose devaluation reduced responding to Y in both groups. Insets represent inter trial interval (ITI) responding outside cue presentation. (C, D) Trial-by-trial performance in Reward Upshift (C) and VTA-DA stimulation (D) groups. 3-way ANOVAs (Cue x Devaluation x Trial) found an interaction between these factors for VTA-DA (F2,20=3.901, P=0.037) but not Reward Upshift (F2,21=1.276, P=0.300) subjects. (E, F) Second-by-second tracking of presence in port during first presentation of each cue. *P<0.05 (Valued vs. Devalued; Bonferroni-corrected t-test). Error bar and error bands = s.e.m. See also Figures S4-S5.

Similar articles

Cited by

References

    1. Eshel N, Tian J, Bukwich M, and Uchida N (2016). Dopamine neurons share common response function for reward prediction error. Nat Neurosci 19, 479–486. - PMC - PubMed
    1. Schultz W, Dayan P, and Montague PR (1997). A neural substrate of prediction and reward. Science 275, 1593–1599. - PubMed
    1. Waelti P, Dickinson A, and Schultz W (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48. - PubMed
    1. Glimcher PW (2011). Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc Natl Acad Sci U S A 108 Suppl 3, 15647–15654. - PMC - PubMed
    1. Rescorla RA, and Wagner AR (1972). A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement In Classical conditioning II: current research and theory, Black AH and Prokasy WF, eds. (New York: Appleton-Century-Crofts; ), pp. 64–99.

Publication types