Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 21:13:10.
doi: 10.3389/fncir.2019.00010. eCollection 2019.

The Functional Role of Striatal Cholinergic Interneurons in Reinforcement Learning From Computational Perspective

Affiliations

The Functional Role of Striatal Cholinergic Interneurons in Reinforcement Learning From Computational Perspective

Taegyo Kim et al. Front Neural Circuits. .

Abstract

In this study, we explore the functional role of striatal cholinergic interneurons, hereinafter referred to as tonically active neurons (TANs), via computational modeling; specifically, we investigate the mechanistic relationship between TAN activity and dopamine variations and how changes in this relationship affect reinforcement learning in the striatum. TANs pause their tonic firing activity after excitatory stimuli from thalamic and cortical neurons in response to a sensory event or reward information. During the pause striatal dopamine concentration excursions are observed. However, functional interactions between the TAN pause and striatal dopamine release are poorly understood. Here we propose a TAN activity-dopamine relationship model and demonstrate that the TAN pause is likely a time window to gate phasic dopamine release and dopamine variations reciprocally modulate the TAN pause duration. Furthermore, this model is integrated into our previously published model of reward-based motor adaptation to demonstrate how phasic dopamine release is gated by the TAN pause to deliver reward information for reinforcement learning in a timely manner. We also show how TAN-dopamine interactions are affected by striatal dopamine deficiency to produce poor performance of motor adaptation.

Keywords: acetylcholine; reinforcement learning; striatal cholinergic interneurons; striatum; tonically active neurons.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diagram of the mechanisms involved with the TAN-dopamine release interactions. Thalamic or cortical excitation leads to membrane depolarization in TANs. In response to depolarization, calcium ions enter through voltage dependent calcium channels, and the slow after-hyperpolarization current (IsAHP) is activated via the efflux of potassium ions through calcium dependent potassium channels. Once the cortical/thalamic excitatory input ends, the efflux of potassium ions causes the membrane to hyperpolarize, which in turn activates the inward dopamine-dependent h-current (Ih) that increases the membrane potential. Furthermore, dopamine (DA) from dopaminergic neurons (DANs) in substantia nigra pars compacta (SNc) binds to D2 receptors on TANs, downregulating the h-current. In concert, TANs produce acetylcholine (ACh), which binds to nicotinic acetylcholine (nACh) receptors on DAN axonal terminals. This cholinergic pathway enables TANs to modulate the release of dopamine into the synaptic cleft. Importantly—since the h-current is downregulated via activation of dopamine D2 receptors—the DA concentration affects the refractory period of TANs.
Figure 2
Figure 2
The TAN pause duration positively correlates with the reward prediction error (RPE). Thalamic stimulus induces an initial burst of TAN activity, followed by a TAN pause. The blue curve is TAN activity; the orange curve is dopamine (DA) concentration; the purple curve is the slow after-hyperpolarization current IsAHP and the green curve is the h-current Ih. (A) RPE = 1, the dopamine concentration increases during the TAN pause as a result of the positive RPE, which slows down Ih activation and thus prolong the pause. (B) For RPE = 0, the TAN pause is shorter, because there is no phasic change in dopamine release, so the concentration of dopamine remains at baseline during the TAN pause. (C) RPE = −1, the TAN pause is even shorter than for RPE = 0 because there is a net decrease in dopamine concentration during the pause, which provides the fastest Ih activation and hence, the shortest pause in TAN activity. Thalamic stimulation duration was 300 ms. TP stands for TAN pause duration in milliseconds.
Figure 3
Figure 3
TAN activity as simulated by the model against experimental data. (A–C) Peristimulus time histogram (PSTH) and raster plot from striatal cholinergic interneurons in response to a train (50 Hz, ten pulses) of thalamic stimulation. The background figures were reproduced from Ding et al. (2010) with permission. For easier comparison, all simulation results (blue lines) were rescaled down at the same ratio and overlaid on the figures of experiment results. (A) Simulation (blue) and data (gray bars) for control condition. (B) Simulation and data for sulpiride (D2 receptor blockade) condition. (C) Simulation and data for cocaine (dopamine reuptake blockade) condition. (D) Simulation of the hypothetical blockade of h-current. TP stands for TAN pause duration.
Figure 4
Figure 4
Effects of dopamine deficiency on TAN pause duration (TP, area between two dotted blue lines) and changes in dopamine concentration (orange) with/without levodopa (L-DOPA). In these simulations, a 50% dopamine deficiency (DA Def) causes both the baseline dopamine concentration and the phasic dopamine release to decrease. (A1–2) RPE = 1 and −1, no dopamine deficiency for reference. (B1) RPE = 1, 50% dopamine deficiency. Normally, the baseline concentration of dopamine would be 1.0. With a deficiency of 50% of dopaminergic inputs, the baseline dopamine concentration is exactly halved; additionally, the phasic release of dopamine decreases in magnitude by 50%, and therefore the duration of the TAN pause also decreases. (B2) RPE = −1. The tonic and phasic release of dopamine are both reduced by the 50% due to dopamine deficiency. During the pause, dopamine concentration converges to zero, so the pause is similar (slightly shorter) to (A2). (C1) RPE = 1. When levodopa (0.5) is applied, the baseline concentration of dopamine returns to normal (1.0) and the duration of the TAN pause duration increases, but it remains smaller than the one with no DA deficiency (A1). This is because the magnitude of phasic dopamine release is unaffected by levodopa. (C2) RPE = −1. When levodopa (0.5) is applied, the baseline concentration of dopamine returns to normal (1.0) as for RPE = 1, but the duration of the TAN pause exceeds the one with no DA deficiency (A2). This is due to the increased (non-zero) dopamine concentration during the pause.
Figure 5
Figure 5
Non-Error based motor adaptation in 50% of dopamine (DA) deficiency condition with/without levodopa medication. (A) Results of ball throwing tasks performed by healthy people and Parkinson's Disease (PD) patients. During experiment, a dove prism was used to horizontally flip subjects' vision as perturbation. This figure was adapted from Gutierrez-Garralda et al. (2013) with permission. (B) Simulation results with levodopa medication. Levodopa means the condition of 50% dopamine deficiency with levodopa medication ([LDOPA] = 1.0). Colored center markers (triangle or circle) are average error values of 8 sessions and error bars represent standard errors. 1 session = 75 trials (Baseline = 25 trials, Prism (visual perturbation) = 25 trials and Aftereffects = 25 trials).
Figure 6
Figure 6
(A-C) The changes in TAN pause (TP) duration by three different factors: the duration of thalamic stimulation, the percentage of dopamine (DA) deficiency, the L-DOPA level in 50% DA deficiency condition when RPE (Reward Prediction Error) = 1 (phasic, reward), 0 (tonic baseline), and −1 (phasic, aversive), respectively. (A) The changes in TP duration by the duration times of thalamic stimulation. The increment of thalamic stimulation duration increases TP duration for all RPE values. The difference of TP duration between RPE = 1 and RPE = −1 keeps increasing nonlinearly as increases in thalamic stimulation duration. (B) The changes in TP duration by the percentages of DA deficiency. The increased percentage of DA deficiency decreases TP duration when RPE = 1 and 0. For RPE = −1, the TP duration is nearly independent of the amount of DA deficiency, which is the result of RPE = −1 corresponding to the minimum possible DA concentration during the TP. Therefore, the TP duration for RPE = −1 is unaffected by the degradation of dopaminergic inputs. The deviation difference of TP duration from RPE = 0 between RPE = 1 and RPE = −1 keeps decreasing nonlinearly as increases in percentage of DA deficiency, which means minimizing the time difference between reward and aversive conditions for reinforcement learning and in turn deteriorating the learning performance. (C) The changes in TP duration by the levels of L-DOPA in 50% DA deficiency condition. In response to the administration of L-DOPA, the TP duration increases similarly for all RPE values. This follows from the fact that L-DOPA alters the baseline concentration of dopamine, but does not affect the phasic dopamine release.
Figure 7
Figure 7
Schematic diagram of two-pathway of basal ganglia integrated with TAN model. Dopaminergic Substantia Nigra pars compacta signal represents the reward prediction error (reward prediction error). PFC, PreFrontal Cortex; M1, Primary Motor Cortex; PMC, PreMotor Cortex; MSN, Medium Spiny Neuron; SNr, Substantia Nigra pars Reticulata; GPi, 0Globus Pallidus internal; GPe, Globus Pallidus external; Substantia Nigra pars compacta, Substantia Nigra pars Compacta; STN, SubThalamic Nucleus.

References

    1. Aosaki T., Miura M., Suzuki T., Nishimura K., Masuda M. (2010). Acetylcholine-dopamine balance hypothesis in the striatum: an update. Geriatr. Gerontol. Int. 10(Suppl. 1), S148–S157. 10.1111/j.1447-0594.2010.00588.x - DOI - PubMed
    1. Aosaki T., Tsubokawa H., Ishida A., Watanabe K., Graybiel A. M., Kimura M. (1994). Responses of tonically active neurons in the primate's striatum undergo systematic changes during behavioral sensorimotor conditioning. J. Neurosci. 14, 3969–3984. 10.1523/JNEUROSCI.14-06-03969.1994 - DOI - PMC - PubMed
    1. Apicella P., Ravel S., Deffains M., Legallet E. (2011). The role of striatal tonically active neurons in reward prediction error signaling during instrumental task performance. J. Neurosci. 31, 1507–1515. 10.1523/JNEUROSCI.4880-10.2011 - DOI - PMC - PubMed
    1. Ashby F. G., Crossley M. J. (2011). A computational model of how cholinergic interneurons protect striatal-dependent learning. J. Cogn. Neurosci. 23, 1549–1566. 10.1162/jocn.2010.21523 - DOI - PubMed
    1. Beigi M., Wilkinson L., Gobet F., Parton A., Jahanshahi M. (2016). Levodopa medication improves incidental sequence learning in Parkinson's disease. Neuropsychologia 93, 53–60. 10.1016/j.neuropsychologia.2016.09.019 - DOI - PMC - PubMed

Publication types

LinkOut - more resources