Review

. 2023 Oct 3;4(6):zqad056.

doi: 10.1093/function/zqad056. eCollection 2023.

Striatal Dopamine Signals and Reward Learning

Affiliations

Affiliation

¹ Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland.

PMID: 37841525
PMCID: PMC10572094
DOI: 10.1093/function/zqad056

Review

Striatal Dopamine Signals and Reward Learning

Pol Bech et al. Function (Oxf). 2023.

. 2023 Oct 3;4(6):zqad056.

doi: 10.1093/function/zqad056. eCollection 2023.

Authors

Affiliation

¹ Laboratory of Sensory Processing, Brain Mind Institute, Faculty of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland.

PMID: 37841525
PMCID: PMC10572094
DOI: 10.1093/function/zqad056

Abstract

We are constantly bombarded by sensory information and constantly making decisions on how to act. In order to optimally adapt behavior, we must judge which sequences of sensory inputs and actions lead to successful outcomes in specific circumstances. Neuronal circuits of the basal ganglia have been strongly implicated in action selection, as well as the learning and execution of goal-directed behaviors, with accumulating evidence supporting the hypothesis that midbrain dopamine neurons might encode a reward signal useful for learning. Here, we review evidence suggesting that midbrain dopaminergic neurons signal reward prediction error, driving synaptic plasticity in the striatum underlying learning. We focus on phasic increases in action potential firing of midbrain dopamine neurons in response to unexpected rewards. These dopamine neurons prominently innervate the dorsal and ventral striatum. In the striatum, the released dopamine binds to dopamine receptors, where it regulates the plasticity of glutamatergic synapses. The increase of striatal dopamine accompanying an unexpected reward activates dopamine type 1 receptors (D1Rs) initiating a signaling cascade that promotes long-term potentiation of recently active glutamatergic input onto striatonigral neurons. Sensorimotor-evoked glutamatergic input, which is active immediately before reward delivery will thus be strengthened onto neurons in the striatum expressing D1Rs. In turn, these neurons cause disinhibition of brainstem motor centers and disinhibition of the motor thalamus, thus promoting motor output to reinforce rewarded stimulus-action outcomes. Although many details of the hypothesis need further investigation, altogether, it seems likely that dopamine signals in the striatum might underlie important aspects of goal-directed reward-based learning.

Keywords: Reward-based learning; dopamine; goal-directed behavior; licking; motor control; neuronal circuits; sensory processing; striatum; synaptic plasticity; whisker sensory perception.

PubMed Disclaimer

Conflict of interest statement

C.C.H.P. holds the position of Editorial Board Member for FUNCTION and is blinded from reviewing or making decisions for the manuscript.

Figures

**Figure 1.**
Learning sensory-to-motor transformations from rewards. Animal behavior is determined by incoming sensory information, innate neuronal circuits, short- and long-term memories, and internal states. In part, actions are tuned to maximize reward. Animals can learn to obtain rewards by responding with appropriate goal-directed motor output to relevant reward-predicting sensory input in specific contexts through trial-and-error reward-based learning. Reward signals (blue) are thought to drive synaptic plasticity in neuronal circuits, such that relevant sensory signals in sensory neurons (S, orange) drive appropriate motor output controlled by motor neurons (M, green) in order to receive reward.

**Figure 2.**
Optogenetically identified dopamine neurons in the midbrain transiently increase firing in response to unexpected rewards. (A) In order to study their activity, extracellular electrophysiological recordings can be targeted to the midbrain dopaminergic neurons (dark blue) located in the substantia nigra pars compacta and the VTA, which respectively prominently innervate the dorsal striatum and ventral striatum, also known as the nucleus accumbens (light blue). (B) The work of Wolfram Schultz and colleagues revealed that delivery of an unexpected reward transiently increases AP firing in putative midbrain dopamine neurons of monkeys. (C) Opto-tagging can be used to record AP firing of genetically identified classes of neurons. For example, the light-gated ion channel ChR2 can be expressed specifically in dopaminergic neurons of the midbrain through mouse genetics and viral transfection. Blue light flashes specifically drive AP firing in ChR2-expressing neurons, which can be recorded by an optrode, a device consisting of an optical fiber coupled to an extracellular recording electrode. Blue light delivery can evoke precisely timed AP firing in subsets of neurons expressing ChR2. The work of Naoshige Uchida and colleagues studied opto-tagged dopaminergic neurons and found that such genetically defined dopaminergic neurons transiently increased AP firing in response to rewards in mice, as shown in panel B and similar to the previous work in monkeys. (D) Substantial evidence supports the hypothesis that dopamine neurons do not only respond to unexpected rewards, but more precisely they encode RPEs. Animals can learn that specific sensory stimuli reliably predict future rewards. After learning, the reward-predicting sensory stimulus evokes a rapid transient increase in dopamine neuron AP firing, but there is no dopamine signal upon reward delivery because it is now entirely expected (top). However, if reward is omitted, then there is a drop in dopamine firing rates because the outcome was worse than expected (negative RPE) (middle). On the other hand, if the reward-predicting sensory stimulus is omitted, then reward delivery is unexpected and is again accompanied by increased dopamine neuron firing (below).

**Figure 3.**
Dopamine modulates synaptic plasticity in the striatum. (A) The midbrain dopaminergic neurons prominently innervate the striatum, which is dominated by two types of GABAergic MSNs expressing different dopamine receptors and projecting to different downstream brain areas. Striatonigral MSNs express D1Rs (green) and project to the SNr. Striatopallidal MSNs express dopamine type 2 receptors (D2Rs, red) and project to the external segment of the globus pallidus. These MSNs also receive glutamatergic input from the cortex and thalamus, and it is thought that a major role of dopamine is to control the plasticity of these glutamatergic inputs to the MSNs. (B) The amplitude of excitatory postsynaptic potentials (EPSPs) onto D1R-expressing MSNs can be increased through long-term potentiation (LTP) induced by pairing presynaptic glutamate release and postsynaptic depolarization together with an increase in dopamine. (C) The mechanisms underlying LTP of glutamatergic synapses on the spines of D1R-expressing MSNs have been studied in detail in brain slice experiments by Haruo Kasai and colleagues. Three time points are schematically indicated: before, during, and after LTP induction. The upper part of the schematic drawings shows a glutamatergic synaptic bouton filled with synaptic vesicles (gray). The lower part shows a dendritic spine (green) of a D1R-expressing MSN with AMPA (red) and NMDA (blue) subtypes of ionotropic glutamate receptors in the postsynaptic density. In the baseline period (left), AP firing of the glutamatergic afferent causes the release of glutamate evoking a small EPSP in the postsynaptic MSN through the opening of AMPA receptors. NMDA receptors are blocked at resting membrane potential by Mg²⁺. During LTP induction (middle), presynaptic glutamate release is paired with postsynaptic depolarization to open NMDA receptors as well as AMPA receptors. NMDA receptor activation allows Ca²⁺ entry into the spine to activate Ca²⁺/calmodulin–dependent protein kinase II (CaMKII), an essential trigger for many forms of LTP. High activity of protein phosphatase 1 (PP1) would normally inactivate CaMKII under baseline conditions preventing LTP induction, but PP1 is inhibited by elevated cAMP signaling driven by dopamine-activated D1Rs. Thus, dopamine can gate the induction of LTP, resulting in an increased number of AMPA receptors in the postsynaptic density of D1R-expressing MSNs (right).

**Figure 4.**
Striatal MSNs expressing D1Rs can drive goal-directed motor output and show enhanced fast sensory responses across learning. (A) Head-restrained thirsty mice can learn to lick a spout for a water reward in response to a whisker deflection (orange), which serves as a sensory cue predicting reward availability for 1 s with licking as the necessary goal-directed motor output to trigger reward delivery in Hit trials. (B) Whole-cell membrane potential (V_m) recordings averaged across Hit trials for post hoc identified D1R-expressing and D2R-expressing MSNs in the DLS of expert (blue) or naïve (green) mice performing the whisker detection task. Whisker deflection evoked a larger depolarization in expert mice compared to naïve mice for both D1R-expressing and D2R-expressing MSNs. However, a fast (20-50 ms after whisker stimulus) sensory response appeared to increase specifically in D1R-expressing MSNs across learning. (C) Sagittal sections through mouse brains counterstained with 4′,6-diamidino-2-phenylindole, DAPI (green). An AAV was injected into the DLS in order to express fluorescent proteins to allow imaging of the cell bodies in the striatum and their axonal projections (magenta). Distinct classes of MSNs were defined by using two transgenic mouse lines in which Cre-recombinase was specifically expressed in either D1R- and D2R-expressing MSNs and injecting the DLS with a Cre-dependent AAV. Dopamine type 1 receptor-expressing MSNs strongly innervate the SNr, whereas D2R-expressing MSNs strongly innervate the GPe. (D) Channelrhodopsin-2 was expressed in either D1R- or D2R-expressing MSNs in the DLS of different mice, which were subsequently trained in the whisker detection task. Once the mice were experts, whisker (orange) and catch (black) trials were randomly interleaved with trials containing a brief blue light pulse (blue) delivered to the DLS. Optogenetic stimulation of D1R-expressing MSNs evoked licking, but not optogenetic stimulation of D2R-expressing MSNs. Apparently, brief activation of D1R-expressing MSNs is sufficient to substitute for the whisker stimulation in this behavior. (E) A schematic circuit diagram that could account for some aspects of the learning and execution of goal-directed motor output in response to a sensory stimulus, as exemplified above by the transformation of a whisker deflection into goal-directed licking in the whisker detection task. Sensory input drives thalamic and cortical neurons, which in turn signal to the striatum. If the sensory input is paired with reward, then the sensory-evoked glutamatergic input from the thalamus and cortex will be accompanied by a dopaminergic reward signal, strengthening the excitation of D1R-expressing MSNs through LTP during reward-based learning. Enhanced sensory-evoked activity of D1R-expressing MSNs will inhibit neurons in SNr, in turn disinhibiting thalamus and brainstem motor nuclei, thus contributing to movement initiation such as licking for reward, causing further reinforcement of the sensorimotor transformation. Panel B is modified from, published under a Creative Commons License. Panels C and D are modified from, published under a Creative Commons License.

See this image and copyright information in PMC

References

1. Sutton RS. Learning to predict by the methods of temporal differences. Mach Learn. 1988;3(1):9–44.
1. Sutton RS, Barto AG. Time-derivative models of Pavlovian reinforcement. In: Learning and Computational Neuroscience: Foundations of Adaptive Networks, Cambridge, Massachusetts, The MIT Press, 1990:497–537.
1. Bush RR, Mosteller F. A mathematical model for simple learning. Psychol Rev. 1951;58(5):313–323. - PubMed
1. Bush RR, Mosteller F. A model for stimulus generalization and discrimination. Psychol Rev. 1951;58(6):413–423. - PubMed
1. Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In:Classical Conditioning II: Current Research and Theory. Vol 2. New York, NY:Appleton-Century-Crofts, 1972:64–99.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Striatal Dopamine Signals and Reward Learning

Affiliation

Striatal Dopamine Signals and Reward Learning

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources