Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 20;27(22):3480-3486.e3.
doi: 10.1016/j.cub.2017.09.049. Epub 2017 Nov 2.

Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features

Affiliations

Optogenetic Blockade of Dopamine Transients Prevents Learning Induced by Changes in Reward Features

Chun Yun Chang et al. Curr Biol. .

Abstract

Prediction errors are critical for associative learning [1, 2]. Transient changes in dopamine neuron activity correlate with positive and negative reward prediction errors and can mimic their effects [3-15]. However, although causal studies show that dopamine transients of 1-2 s are sufficient to drive learning about reward, these studies do not address whether they are necessary (but see [11]). Further, the precise nature of this signal is not yet fully established. Although it has been equated with the cached-value error signal proposed to support model-free reinforcement learning, cached-value errors are typically confounded with errors in the prediction of reward features [16]. Here, we used optogenetic and transgenic approaches to prevent transient changes in midbrain dopamine neuron activity during the critical error-signaling period of two unblocking tasks. In one, learning was unblocked by increasing the number of rewards, a manipulation that induces errors in predicting both value and reward features. In another, learning was unblocked by switching from one to another equally valued reward, a manipulation that induces errors only in reward feature prediction. Preventing dopamine neurons in the ventral tegmental area from firing for 5 s beginning before and continuing until after the changes in reward prevented unblocking of learning in both tasks. A similar duration suppression did not induce extinction when delivered during an expected reward, indicating that it did not act independently as a negative prediction error. This result suggests that dopamine transients play a general role in error signaling rather than being restricted to only signaling errors in value.

Keywords: associative learning; blocking; dopamine; rat; reward prediction error.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Histological verification, task designs, and pellet preference test
A) Fiber implants were localized in the vicinity of NpHR expression in VTA. The light orange shading represents the maximal spread of expression at each level, whereas the dark orange shading represents the minimal spread. B) Expression of NpHR showed a high degree of colocalization (~90%) with TH in VTA neurons. Green represents NpHR-eYFP, red represents TH. Scale is 500 μm. C) Design of number (top) and identity (bottom) unblocking tasks. All rats were trained in both tasks; order of training was counterbalanced. D) Preference test comparing consumption of banana and chocolate pellets used in identity unblocking task. During the test, the rats were given access to both banana and chocolate pellets (200 pellets each). The number of remaining pellets were assessed every 2.5 min, 5 min, and 10 min as the test progressed. There was no discernable difference in the consumption rate between the two flavors during the course of 60 min test (p >0.32).
Figure 2
Figure 2. Optogenetic blockade of dopamine transients prevents learning induced by changes in reward number
Design is illustrated in Figure 1c. Conditioned responding is shown to VB and VUB during conditioning and reconditioning (left), to VB/AB and VUB/AUB during compound training (middle), and to AB and AUB during the probe test (right). Conditioned responding is represented as the percentage of time the rats spent in the food cup during cue presentation. Top panels for compound training and probe test show data from the experimental run (Exp), when neurons were suppressed during delivery of the second pellet, and bottom panels show data from the ITI run (ITI), when neurons were suppressed during the intertrial interval. Insets show the percentage of time rats spent in the food cup during the reward period after termination of the cues. (See also Figure S1)
Figure 3
Figure 3. Optogenetic blockade of dopamine transients prevents learning induced by changes in reward flavor
Design is illustrated in Figure 1c. Conditioned responding is shown to VB and VUB during conditioning and reconditioning (left), to VB/AB and VUB/AUB during compound training (middle), and to AB and AUB during probe test (right). Conditioned responding is represented as the percentage of time the rats spent in the food cup during cue presentation. Top panels for compound training and probe test show data from the experimental run (Exp), when neurons were suppressed during delivery of the second pellet, and bottom panels show data from the ITI run (ITI), when neurons were suppressed during the intertrial interval. Insets show the percentage of time rats spent in the food cup during the reward period after termination of the cues.

Comment in

References

    1. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical Conditioning II: Current Research and Theory. Appleton-Century-Crofts; New York: 1972. pp. 64–99.
    1. Sutton RS. Learning to predict by the method of temporal difference. Machine Learning. 1988;3:9–44.
    1. Mirenowicz J, Schultz W. Importance of unpredictability for reward responses in primate dopamine neurons. Journal of Neurophysiology. 1994;72:1024–1027. - PubMed
    1. Roesch MR, Calu DJ, Schoenbaum G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience. 2007;10:1615–1624. - PMC - PubMed
    1. Waelti P, Dickinson A, Schultz W. Dopamine responses comply with basic assumptions of formal learning theory. Nature. 2001;412:43–48. - PubMed