Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Oct 21;88(2):247-63.
doi: 10.1016/j.neuron.2015.08.037.

Dopamine Prediction Errors in Reward Learning and Addiction: From Theory to Neural Circuitry

Affiliations
Review

Dopamine Prediction Errors in Reward Learning and Addiction: From Theory to Neural Circuitry

Ronald Keiflin et al. Neuron. .

Abstract

Midbrain dopamine (DA) neurons are proposed to signal reward prediction error (RPE), a fundamental parameter in associative learning models. This RPE hypothesis provides a compelling theoretical framework for understanding DA function in reward learning and addiction. New studies support a causal role for DA-mediated RPE activity in promoting learning about natural reward; however, this question has not been explicitly tested in the context of drug addiction. In this review, we integrate theoretical models with experimental findings on the activity of DA systems, and on the causal role of specific neuronal projections and cell types, to provide a circuit-based framework for probing DA-RPE function in addiction. By examining error-encoding DA neurons in the neural network in which they are embedded, hypotheses regarding circuit-level adaptations that possibly contribute to pathological error signaling and addiction can be formulated and tested.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Activity of DA neurons complies with formal models of reinforcement
The temporal difference model of reinforcement defines a reward prediction error (RPE) as the discrepancy between the most recent reward prediction [V(St-1)] and any new information regarding reward, be it the reward itself [Rt] or a signal that causes a change in the prospect of reward [V(St]. A. Early in Pavlovian training, the surprising delivery of reward produces a positive RPE paralleled by phasic activation of DA neurons. B. With sufficient training the presentation of a cue signaling reward produces a positive RPE while the reward itself no longer results in RPE. This shift is paralleled by a similar shift in the DA neuron response. C. Omission of expected reward results in a negative RPE, paralleled by a transient reduction of DA activity below baseline. ITI= Inter Trial Interval. (Adapted from Morita et al. 2012.)
Figure 2
Figure 2. Optogenetic stimulation of DA neurons mimics RPE and drives reward learning
A. Behavioral protocol. Rats were trained to associate an auditory stimulus (cue A) with sucrose. Once this association was learned, as attested by stable levels of conditioned approach to the sucrose delivery port, a visual stimulus (cue X) was added to the auditory stimulus and this compound stimulus (cue AX) was paired with the sucrose reward. During compound stimulus conditioning sessions, DA neurons were photoactivated during reward consumption (Paired Stim.) to artificially create a normally-absent RPE, the sucrose reward being perfectly predicted at this stage by cue A. Control (Unpaired Stim.) rats received DA neuron activation of DA neurons during the intertrial interval. At test, conditioned responding to stimulus X alone was tested in absence of sucrose or optical stimulation. B. Optical stimulation of DA neurons during a compound cue trial. Optical stimulation of DA neurons was synchronized with the delivery of the anticipated reward in Paired Stimulation rats; Unpaired Stimulation rats received optical stimulation of DA neurons at a variable time after cue and reward delivery. C. In rats that received previous stimulation of DA neurons during reward, the presentation of cue X elicited approach to the location of previous sucrose delivery. In contrast rats that received DA neuron manipulation unpaired with reward showed low, or blocked, responding to cue X (Steinberg et al., 2013).
Figure 3
Figure 3. Simplified neural circuit diagram for the computation of the RPE by DA neurons
The coordinated activation of this circuit by rewards and their predictors could result in DA neuron firing in accordance with the RPE model. GPe: external globus pallidus; LHb: lateral habenula; LH: lateral hypothalamus; MSN: medium spiny neurons; PPTN: pedunculopontine tegmental nucleus; RMTg: rostromedial tegmental nucleus; SN: substantia nigra; STN: subthalamic nucleus; VP: ventral pallidum; VTA: ventral tegmental area
Figure 4
Figure 4. Differences between food-related and cocaine-related phasic DA signals in the NAc and potential consequences for learning
A. Prior to learning unexpected delivery of food reward results in phasic DA signals. As subjects learn that cue presentation signals food delivery, DA responses are transferred from the reward to the cue. B. When cocaine is the reward, each drug injection produces, with some delay, a burst of phasic DA events as a consequence of the pharmacological actions of the drug. As with natural reward, phasic DA responses progressively emerge to the cue. Unlike food-induced DA signals, drug-induced DA signals are not modulated by expectations and persist throughout learning. C. Proposed consequences of these DA signals on learning. Food-evoked DA signals modulated by reward expectations promote learning until the prediction matches the actual outcome, resulting in stable cue value after a few trials. In contrast, persistent cocaine-evoked DA signals continue to increase the value of cocaine cues with every trial. Eventually, the value of cocaine cues surpasses the value of the food cues and can bias decision-making towards cocaine.
Figure 5
Figure 5. Proposed mechanism for the accelerated propagation of RPE-DA signals in different striatal domains following drug exposure and its consequence for learning
A. In drug naïve animals the activity of midbrain DA neurons is tightly regulated by local inhibitory GABA neurons. Unexpected rewards activate dopamine neurons in the VTA; the resulting DA release in the NAc promotes Pavlovian (S–O) learning. B. Repeated exposure to cocaine increases the excitability of DA neurons by potentiating striatal inhibitory inputs on midbrain local GABA neurons. Striatal feedback on DA neurons progressively recruits more lateral DA neurons from VTA to SN for encoding of RPE. The resulting emergence of DA signals in the sensorimotor DLS reinforces S-R associations that contribute to rigid and possibly compulsive drug-seeking behavior. ACC: anterior cingulated cortex; DLS: dorsolateral striatum; DMS: dorsomedial striatum; NAc: nucleus accumbens; OFC: orbitofrontal cortex; SM: sensorymotor cortex; SN: substantia nigra; VTA: ventral tegmental area; S-O: stimulus-outcome; A-O: action-outcome; S-R: stimulus-response.

References

    1. Addy NA, Daberkow DP, Ford JN, Garris PA, Wightman RM. Sensitization of rapid dopamine signaling in the nucleus accumbens core and shell after repeated cocaine in rats. J. Neurophysiol. 2010;104:922–931. - PMC - PubMed
    1. Aggarwal M, Hyland BI, Wickens JR. Neural control of dopamine neurotransmission: Implications for reinforcement learning. Eur. J. Neurosci. 2012;35:1115–1123. - PubMed
    1. Bayer HM, Glimcher PW. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron. 2005;47:129–141. - PMC - PubMed
    1. Beckstead RM, Domesick VB, Nauta WJ. Efferent connections of the substantia nigra and ventral tegmental area in the rat. Brain Res. 1979;175:191–217. - PubMed
    1. Belin D, Everitt BJ. Cocaine seeking habits depend upon dopamine-dependent serial connectivity linking the ventral with the dorsal striatum. Neuron. 2008;57:432–441. - PubMed

Publication types