Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar;49(5):726-736.
doi: 10.1111/ejn.13921. Epub 2018 Apr 14.

A silent eligibility trace enables dopamine-dependent synaptic plasticity for reinforcement learning in the mouse striatum

Affiliations

A silent eligibility trace enables dopamine-dependent synaptic plasticity for reinforcement learning in the mouse striatum

Tomomi Shindou et al. Eur J Neurosci. 2019 Mar.

Abstract

Dopamine-dependent synaptic plasticity is a candidate mechanism for reinforcement learning. A silent eligibility trace - initiated by synaptic activity and transformed into synaptic strengthening by later action of dopamine - has been hypothesized to explain the retroactive effect of dopamine in reinforcing past behaviour. We tested this hypothesis by measuring time-dependent modulation of synaptic plasticity by dopamine in adult mouse striatum, using whole-cell recordings. Presynaptic activity followed by postsynaptic action potentials (pre-post) caused spike-timing-dependent long-term depression in D1-expressing neurons, but not in D2 neurons, and not if postsynaptic activity followed presynaptic activity. Subsequent experiments focused on D1 neurons. Applying a dopamine D1 receptor agonist during induction of pre-post plasticity caused long-term potentiation. This long-term potentiation was hidden by long-term depression occurring concurrently and was unmasked when long-term depression blocked an L-type calcium channel antagonist. Long-term potentiation was blocked by a Ca2+ -permeable AMPA receptor antagonist but not by an NMDA antagonist or an L-type calcium channel antagonist. Pre-post stimulation caused transient elevation of rectification - a marker for expression of Ca2+ -permeable AMPA receptors - for 2-4-s after stimulation. To test for an eligibility trace, dopamine was uncaged at specific time points before and after pre- and postsynaptic conjunction of activity. Dopamine caused potentiation selectively at synapses that were active 2-s before dopamine release, but not at earlier or later times. Our results provide direct evidence for a silent eligibility trace in the synapses of striatal neurons. This dopamine-timing-dependent plasticity may play a central role in reinforcement learning.

Keywords: dopamine; learning; reinforcement; temporal difference.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Spike‐timing‐dependent plasticity in MSNs. (a) Identification of D1 (top row) and D2 (bottom row) MSNs. Left, EGFP‐labelled neurons; centre, recorded cell labelled with Alexa Fluor 488; right, combined. (b) Pairing protocols. Left, pre–post pairing; right, post–pre pairing. Paired stimulation was repeated 60 times at 10‐s intervals. (c) Pre–post stimulation caused t‐LTD in D1 MSNs but not in D2 MSNs. Top, pre–post pairing compared with post–pre pairing in D1 MSNs. Bottom, same comparison in D2 MSNs.
Figure 2
Figure 2
Supralinear calcium influx associated with t‐LTD. (a) Two‐photon imaging of dendritic spines (S) and dendrites (D) in D1 and D2 MSNs showed [Ca2+]i increase (red traces) in identified MSNs in response to pairing protocols (pre–post, post–pre), glutamate uncaging alone (uEPSP) and postsynaptic action potentials alone (3APs). (b) Supralinear dendritic spine [Ca2+]i increase in D1‐MSNs. Pre–post [Ca2+]i increase exceeded the algebraic sum of pre and post alone, in D1 MSNs but not in D2 MSNs. (c) Plot shows significantly greater nonlinearity of pre–post [Ca2+]i increase in D1 MSNs, relative to post–pre in D1 or D2 MSNs, of pre–post in D2 MSNs (F 1,20 = 20.17, P = 0.00022, two‐way anova).
Figure 3
Figure 3
Requirements for t‐LTD and t‐LTP. (a) Blocking L‐type calcium channels with nimodipine (nimo) blocked t‐LTD, but blocking NMDA channels with APV did not. (b) Pre–post stimulation plus dopamine D1 receptor agonist SKF 81297 (SKF) caused t‐LTP when t‐LTD was blocked by nimodipine. (c) Dopamine D1 receptor‐dependent t‐LTP was blocked by 1‐naphthylacetyl spermine (NAS) but not by APV. (d) Summary data, all group averages: control vs. SKF: t(15) = 3.81, P = 0.002; control vs. nimo: t(17) = 4.448, P = 0.00035; control vs. SKF + nimo: t(15) = 5.23, P = 0.0001; SKF vs. SKF + nimo: t(12) = 3.34, P = 0.006; nimo vs. SKF + nimo: t(14) = 2.96, P = 0.01; SKF + nimo vs. NAS + SKF + nimo: t(12) = 2.23, P = 0.046; all independent samples t‐tests. (e) Two‐photon imaging of calcium shows that supralinear calcium influx in D1 MSNs (nonlinearity) was blocked by nimodipine. *, ** P < 0.05, P < 0.01.
Figure 4
Figure 4
Time‐dependent modulation of spike‐timing‐dependent plasticity by dopamine supports eligibility trace hypothesis. (a) The rectification index (ratio of excitatory synaptic current at −80 mV to that at +40 mV under voltage clamp, normalized to baseline) indicates a transient increase in Ca‐permeable AMPA receptors after pre–post pairing in D1 MSNs. Filled circles connected by dashed lines show individual neurons. Red bar shows mean. (b) Ultraviolet flash (arrowhead) causes phasic dopamine release by photolysis of caged dopamine (Lee et al., 1996). Insets show background‐subtracted fast‐scan cyclic voltammograms at baseline (left) and peak response (right) with oxidation and reduction peaks for dopamine. Slope of trace is due to UV light effect on carbon fibre electrode. (c) Uncaged dopamine has functional effects. Dopamine release in response to UV light flash causes hyperpolarization of dopamine neuron (black trace, inset shows typical electrophysiological response to depolarizing current) that is blocked by a dopamine antagonist (red trace). (d) Timing diagram for phasic dopamine release evoked by UV uncaging (coloured triangles) in relation to presynaptic (pre) and postsynaptic (post) activity. Lower traces show predicted timecourse of phasic dopamine release evoked by flashes based on panel b. (e) Group average data showing that phasic dopamine release by uncaging 2s after pre–post stimulation causes t‐LTP in D1 MSNs, in the presence of nimodipine. No change is seen in control condition (no uncaging). (f, g) In support of the eligibility trace hypothesis, t‐LTP depends on timing of phasic dopamine release. (f) In the absence of nimodipine, group averages for dopamine release at −2, 0, +2 and +4s relative to pre–post conjunction of activity show t‐LTP relative to no‐dopamine, no‐nimodipine control when uncaging occurs 2s after pre–post pairing, but no change relative to no‐nimodipine control group at other time points. (g) In the presence of nimodipine, group averages for dopamine release at −2, 0, +2 and +4s relative to pre–post conjunction of activity show t‐LTP relative to no‐dopamine, nimodipine control group, when uncaging occurs 2s after pre–post pairing, but no change at other time points. *, ** P < 0.05, P < 0.01.
Figure 5
Figure 5
Dopamine release in response to spike‐timing‐dependent plasticity protocols. (a) Theta stimulation protocol used to induce NMDA receptor‐dependent t‐LTD by Shen et al. (2008). (b) Voltammetric records show dopamine release evoked by theta burst stimulation at different current intensities. (c) Diagram showing intrastriatal location of theta glass stimulating electrode and carbon fibre voltammetry electrode. (d) Group average effects of theta stimulation showing dopamine release over the range of current intensities (5–30 μA). (e) Diagram showing location of cortical bipolar stimulating electrode and carbon fibre voltammetry electrode. (f) Peak dopamine concentration measured under control, bicuculline (BIC) and bicuculline plus APV conditions used by Pawlak & Kerr (2008), showing dopamine release in bicuculline is reduced by APV.

References

    1. Barto, A.G. , Sutton, R.S. & Brouwer, P.S. (1981) Associative search network: A reinforcement learning associative memory. Biol. Cybern., 40, 201–211.
    1. Barto, A.G. , Sutton, R.S. & Anderson, C.W. (1983) Neuronlike elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cyber., 15, 835–846.
    1. Barto, A.G. , Sutton, R.S. & Watkins, C.J.C.H. (1990). Learning and sequential decision making In Gabriel M. & Moore J.W. (Eds), Learning and Computational Neuroscience: Foundations of Adaptive Networks. MIT Press, Cambridge, MA, pp. 539–602.
    1. Black, J. , Belluzzi, J.D. & Stein, L. (1985) Reinforcement delay of one‐second severely impairs acquisition of brain self‐stimulation. Brain Res., 359, 113–119. - PubMed
    1. Bowie, D. & Mayer, M.L. (1995) Inward rectification of both AMPA and kainate subtype glutamate receptors generated by polyamine‐mediated ion channel block. Neuron, 15, 453–462. - PubMed

Publication types