How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues

J Brown¹, D Bullock, S Grossberg

Affiliations

PMID: 10575046
PMCID: PMC6782432
DOI: 10.1523/JNEUROSCI.19-23-10502.1999

How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues

J Brown et al. J Neurosci. 1999.

. 1999 Dec 1;19(23):10502-11.

doi: 10.1523/JNEUROSCI.19-23-10502.1999.

Authors

J Brown¹, D Bullock, S Grossberg

Affiliation

¹ Department of Cognitive and Neural Systems and Center for Adaptive Systems, Boston University, Boston, Massachusetts 02215, USA.

PMID: 10575046
PMCID: PMC6782432
DOI: 10.1523/JNEUROSCI.19-23-10502.1999

Abstract

After classically conditioned learning, dopaminergic cells in the substantia nigra pars compacta (SNc) respond immediately to unexpected conditioned stimuli (CS) but omit formerly seen responses to expected unconditioned stimuli, notably rewards. These cells play an important role in reinforcement learning. A neural model explains the key neurophysiological properties of these cells before, during, and after conditioning, as well as related anatomical and neurophysiological data about the pedunculopontine tegmental nucleus (PPTN), lateral hypothalamus, ventral striatum, and striosomes. The model proposes how two parallel learning pathways from limbic cortex to the SNc, one devoted to excitatory conditioning (through the ventral striatum, ventral pallidum, and PPTN) and the other to adaptively timed inhibitory conditioning (through the striosomes), control SNc responses. The excitatory pathway generates CS-induced excitatory SNc dopamine bursts. The inhibitory pathway prevents dopamine bursts in response to predictable reward-related signals. When expected rewards are not received, striosomal inhibition of SNc that is unopposed by excitation results in a phasic drop in dopamine cell activity. The adaptively timed inhibitory learning uses an intracellular spectrum of timed responses that is proposed to be similar to adaptively timed cellular mechanisms in the hippocampus and cerebellum. These mechanisms are proposed to include metabotropic glutamate receptor-mediated Ca(2+) spikes that occur with different delays in striosomal cells. A dopaminergic burst in concert with a Ca(2+) spike is proposed to potentiate inhibitory learning. The model provides a biologically predictive alternative to temporal difference conditioning models and explains substantially more data than alternative models.

PubMed Disclaimer

Figures

**Fig. 1.**
Dopamine cell firing patterns.Left, Data. *Right*, Model simulation, showing model spikes and underlying membrane potential.A, In naive monkeys, the dopamine cells fire a phasic burst when unpredicted primary reward R occurs (e.g., if the monkey receives a burst of apple juice unexpectedly). B, As the animal learns to expect the apple juice that reliably follows a sensory cue [conditioned stimulus (CS)] that precedes it by a fixed time interval, then the phasic dopamine burst disappears at the expected time of reward, and a new burst appears at the time of the reward-predicting CS. C, After learning, if the animal fails to receive reward at the expected time, a phasic depression in dopamine cell firing occurs. Thus, these cells reflect an adaptively timed expectation of reward that cancels the expected reward at the expected time. [The data in Figure 1 (*column 1*) are reprinted with permission from Schultz et al. (1997).]

**Fig. 2.**
Dopamine cell firing patterns.Left, Data. *Right*, Model simulation, showing model spikes and underlying membrane potential.A, The dopamine cells learn to fire in response to the earliest consistent predictor of reward. When CS2 (*Instruction*) consistently precedes the original CS (*Trigger*) by a fixed interval, the dopamine cells learn to fire only in response to CS2. [Data reprinted with permission fromSchultz et al. (1993).] B, During training, the cell fires weakly in response to both the CS and reward. [Data reprinted with permission from Ljungberg et al. (1992).] C, Temporal variability in reward occurrence. When reward is received later than predicted, a depression occurs at the time of predicted reward, followed by a phasic burst at the time of actual reward.D, Likewise, if reward occurs earlier than predicted, a phasic burst occurs at the time of actual reward. No depression follows because the CS is released from working memory. [Data inC and D reprinted with permission fromHollerman and Schultz (1998).] E, When there is random variability in the timing of primary reward across trials (e.g., when the reward depends on an operant response to the CS), the striosomal cells produce a “Mexican hat” depression on either side of the dopamine spike. [Data reprinted with permission from Schultz et al. (1993).]

**Fig. 3.**
Trained firing patterns in PPTN, ventral striatum, striosomes, and lateral hypothalamus. *Left*, Data.Right, Model simulations, showing model spikes and underlying membrane potential. A, PPTN cell (cat), showing phasic responses to both CS and primary reward. [Data reprinted with permission from Dormont et al. (1998).] In the model, phasic signaling is caused by accommodation or habituation (Takakusaki et al., 1997), which causes the cell to fire in response to the earliest reward-predicting CS and US reward, but not to subsequent CSs before reward. B, Ventral striatal cells show sustained working memory-like response between trigger and a US reward, and a phasic response to the US reward. [Data reprinted with permission fromSchultz et al. (1992).] C, A ventral striatal cell, predicted here to be a striosomal cell, shows buildup to phasic primary reward response. For the model cell, j = 39. [Data reprinted with permission from Schultz et al. (1992).]D, A lateral hypothalamic neuron with a strong, phasic response to glucose reward. [Data reprinted with permission fromNakamura and Ono (1986).] The majority of these neurons fired in response to primary reward but not to a reward-predicting CS. The model lateral hypothalamic input is a rectangular pulse.

**Fig. 4.**
Model circuit. Cortical inputs(I_i) excited by conditioned stimuli learn to excite the SNc (D) via the ventral striatal (S)-to-ventral pallidal-to-PPTN (P)-to-SNc path. The inputsI_i excite the ventral striatum via adaptive weights W_iS, and the ventral striatum excites the PPTN, via double inhibition through the ventral pallidum, with strength W_SP. When the PPTN activity exceeds a threshold Γ_P, it excites the dopamine cell with strength W_PD. The striosomes, which contain an adaptive spectral timing mechanism (x_ij, G_ij,Y_ij, Z_ij), learn to generate lagged, adaptively timed signals that inhibit reward-related activation of SNc. Primary reward signals (I_R) from the lateral hypothalamus both excite the PPTN directly (with strengthW_RP) and act as training signals to the ventral striatum S (with strengthW_RS). *Arrowheads*denote excitatory pathways, *circles* denote inhibitory pathways, and *hemidisks* denote synapses at which learning occurs. *Thick pathways* denote dopaminergic signals.

**Fig. 5.**
Striosomal spectral timing model and closeup (*inset*), showing individual timing pulses. Each curve represents the suprathreshold intracellular Ca²⁺concentration [G_ijY_ij− Γ_s]⁺ of one striosomal cell. The peaks are spread out in time so that reward can be predicted at various times after CS onset, by strengthening the inhibitory effect of the striosomal cell with the appropriate delay. The model uses 40 peaks, spanning ∼2 sec and beginning 100 msec after the CSs (Grossberg and Schmajuk, 1989). Model properties are robust when different numbers of peaks are used. It is important that the peaks be sufficiently narrow and tightly spaced to permit fine temporal resolution in the reward-canceling signal. However, a trade-off ensues in that more timed signals must be used as the time between peaks is reduced. The timed signals must not begin too early after the CS, or they will erroneously cancel the CS-induced dopamine burst. The 100 msec post-CS onset delay prevents this from happening.

See this image and copyright information in PMC

References

1. Berns G, Sejnowski T. A computational model of how the basal ganglia produce sequences. J Cognit Neurosci. 1998;10:108–121. - PubMed
1. Berridge K, Robinson T. What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? Brain Res Rev. 1998;28:309–369. - PubMed
1. Brog J, Salyapongse A, Deutch A, Zahm D. The patterns of afferent innervation of the core and shell in the “accumbens” part of the rat ventral striatum: immunohistochemical detection of retrogradely transported fluoro-gold. J Comp Neurol. 1993;338:255–278. - PubMed
1. Buonomano DV, Mauk MD. Neural network model of the cerebellum: temporal discrimination and the timing of motor responses. Neural Comput. 1994;6:38–55.
1. Calabresi P, Maj R, Pisani A, Mercuri N, Bernardi G. Long-term synaptic depression in the striatum: physiological and pharmacological characterization. J Neurosci. 1992a;12:4224–4233. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues

Affiliation

How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous