Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 11;113(41):E6281-E6289.
doi: 10.1073/pnas.1612392113. Epub 2016 Sep 26.

Pallidal spiking activity reflects learning dynamics and predicts performance

Affiliations

Pallidal spiking activity reflects learning dynamics and predicts performance

Eitan Schechtman et al. Proc Natl Acad Sci U S A. .

Abstract

The basal ganglia (BG) network has been divided into interacting actor and critic components, modulating the probabilities of different state-action combinations through learning. Most models of learning and decision making in the BG focus on the roles of the striatum and its dopaminergic inputs, commonly overlooking the complexities and interactions of BG downstream nuclei. In this study, we aimed to reveal the learning-related activity of the external segment of the globus pallidus (GPe), a downstream structure whose computational role has remained relatively unexplored. Recording from monkeys engaged in a deterministic three-choice reversal learning task, we found that changes in GPe discharge rates predicted subsequent behavioral shifts on a trial-by-trial basis. Furthermore, the activity following the shift encoded whether it resulted in reward or not. The frequent changes in stimulus-outcome contingencies (i.e., reversals) allowed us to examine the learning-related neural activity and show that GPe discharge rates closely matched across-trial learning dynamics. Additionally, firing rates exhibited a linear decrease in sequences of correct responses, possibly reflecting a gradual shift from goal-directed execution to automaticity. Thus, modulations in GPe spiking activity are highest for attention-demanding aspects of behavior (i.e., switching choices) and decrease as attentional demands decline (i.e., as performance becomes automatic). These findings are contrasted with results from striatal tonically active neurons, which show none of these task-related modulations. Our results demonstrate that GPe, commonly studied in motor contexts, takes part in cognitive functions, in which movement plays a marginal role.

Keywords: actor–critic model; attention; basal ganglia; globus pallidus; learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Experimental paradigm and behavior. (A) Coronal MRI scans showing the locations of the recording chambers. (Left) Monkey C. (Right) Monkey V. (B) The 300- to 6,000-Hz filtered traces from a single electrode showing spontaneous GPe spiking. Note the “pause” in high frequency firing of the recorded unit, implicating it as one belonging to GPe (14, 54). (Scale bar, 100 ms.) (C) Behavioral paradigm. Trials were preceded by a presentation of an initiation cue (black rectangle). Two seconds after the monkeys touched the cue, three square fractal stimuli appeared, one of which was deterministically associated with reward. Two seconds after choosing a stimulus, an outcome cue (red rectangle) appeared. When the cue was pressed, the monkeys received either liquid reward or no reward, depending on the selected stimulus. Once a predefined learning criterion was reached, a different stimulus was associated with reward (reversal). There was no explicit cue for the reversal, and the monkeys had to learn by trial and error. IC, initiation cue; OC, outcome cue; OR, outcome revealed; SA, stimuli appear; SC, stimulus chosen; TS, trial start. (D) Behavioral patterns in sets of two trials. If the first trial (Tn) was rewarded (Left), it can be followed by either another rewarded trial (R⇏R; solid blue) or a different unrewarded one (R⇒U, solid red). If Tn was unrewarded (Right), it could be followed in Tn+1 by an identical choice (U⇏U; dotted red), a different unrewarded choice (U⇒U; dashed red), or a different rewarded choice (R⇒R, dashed blue). (E) Success rates and response times per trial before and after reversal. (Upper) The blue (red) line signifies the probability of choosing the currently (previously) rewarded stimulus. (Lower) Response times. Values represent the mean of 314–1,411 repetitions per trial position. Shaded areas indicate SEM. (F) Mean performance metrics after reversals for both monkeys. From left to right, number of trials until first choosing a stimulus other than the previously rewarded one, first choosing the newly rewarded stimulus, first choosing the newly rewarded stimulus for at least three sequential trials, and reaching the criterion for reversal. Values represent the mean results of 662/750 blocks for monkeys V/C, respectively. Error bars indicate SEMs.
Fig. 2.
Fig. 2.
Subsequent switching of the chosen stimuli is encoded by absolute modulation of GPe FRs. (A) To examine the predictive nature of modulations in GPe discharge rates, we considered different epochs throughout the course of pairs of consecutive trials (i.e., Tn and Tn+1). Top to bottom: the first second after the outcome of Tn was revealed (OR); the last second of the ITI, before the initiation cue (IC); the last second before a stimulus was chosen in Tn+1 (SC); and the first second after the OR of Tn+1. (B) Per epoch, we examined the mean absolute Z-scores of the FR for different response patterns: no reward in Tn, switch to rewarded choice in Tn+1 (U⇒R; dashed blue; n = 3,161–3,384 trials); no reward in Tn, switch to different unrewarded choice in Tn+1 (U⇒U; dashed red; n = 3,090–3,227 trials); no reward in Tn, identical unrewarded choice in Tn+1 (U⇏U; dotted red; n = 3,231–3,349 trials); reward in Tn, switch to unrewarded choice in Tn+1 (R⇒U; solid red; n = 2,261–2,367 trials); and reward in Tn, identical rewarded choice in Tn+1 (R⇏R; solid blue; n = 13,979–14,666 trials). Gray areas signify the examined epoch for C and D. (C) Mean values for each epoch and condition. Bar widths reflect the portion of trials that corresponded to each response pattern (e.g., most trials belonged to the R⇏R group). (D) β coefficient estimates for a generalized linear model predicting the absolute Z-scores based on the previous outcome and on whether the choice in Tn+1 differed from that in Tn. n.s, P > 0.05, ***P < 0.001. Shaded areas/error bars in all panels represent SEMs across trials. Data for this figure were extracted from all 306 GPe units that passed the inclusion criteria.
Fig. 3.
Fig. 3.
GPe FR correlates with learning curve slope. (A) Population mean of unit discharge rates for the time segment around stimulus choice (SC) for rewarded trials following reversal. Color codes from yellow to blue indicate the indices of the trial positions following reversal (e.g., yellow signifies the first trial after reversal). The number of trials averaged ranged between 475 and 1,019. The gray area signifies the examined epoch for B and C. (B) Purple squares denote the normalized mean GPe discharge rates during the last second before SC for all rewarded trials that follow reversal (e.g., the leftmost value signifies the normalized mean FR of all rewarded trials around the first SC following reversal). The solid green line depicts the normalized probability of success following reversal (i.e., the learning curve). The dashed green line depicts the normalized derivative of the learning curve (i.e., the learning slope). For clarity, all three plots were normalized, so the y axis ranges from 0 to 1. (C) Same as B, except only blocks in which the first sequence of at least three correct trials began more than three trials after reversal were considered (i.e., blocks with nonoptimal reversal). This selection dissociates between the hypotheses that FR is correlated to the learning curve and that it is correlated to the learning rate. The number of trials for which the FR was averaged ranged between 159 and 749. Error bars in B and C represent SEMs across trials. Although some bars are truncated, they are all symmetric around the means so no data are obscured. Data for this figure were extracted from all 306 GPe units that passed the inclusion criteria.
Fig. 4.
Fig. 4.
GPe FR decreased linearly during consecutive correct sequences of trials. (A) Population mean of unit discharge rates for the time segment around stimulus choice (SC) for the first sequence of 10 consecutive correct choices of each block. Color codes from yellow to blue indicate the indices of the trial positions along the consecutive run (e.g., yellow signifies the first trial in the sequence). The number of trials averaged ranged between 862 and 883. The gray area signifies the examined epoch for B and C. (B) Normalized mean GPe discharge rates during the last second before SC for all trials in sequences of at least 10 rewarded trials(rpearson= −0.93; P < 0.001). Error bars represent SEMs across trials. (C) Normalized mean GPe FR during the last second before SC for all trials in sequences of four, five, or six (yellow, red, and blue lines, respectively) unrewarded trials in which the same stimulus was repeatedly chosen (rpearson = −0.81, −0.15, and 0.28; P = 0.19, 0.81, and 0.58, n = 240–242, 87–89, and 36–40, respectively). Unlike in B, no decrease was observed throughout the sequence. Although some bars are truncated, they are all symmetric around the means so no data are obscured. Data for this figure were extracted from all 306 GPe units that passed the inclusion criteria.
Fig. S1.
Fig. S1.
GPe discharge rates immediately following outcome delivery decrease linearly in series of consecutive correct trials. (A) Population mean of unit discharge rates for the time segment around outcome delivery (OR, outcome revealed) for the first sequence of 10 consecutive correct choices of each block. Color codes from yellow to blue indicate the indices of the trial positions along the consecutive run (e.g., yellow signifies the first trial in the sequence). The number of trials averaged ranged between 834 and 866. The gray area signifies the examined epoch for B. (B) Normalized mean GPe discharge rates during the first second after OR for all trials in sequences of at least 10 rewarded trials (rpearson = −0.89; P < 0.001). Error bars represent SEMs across trials. Although some bars are truncated, they are all symmetric around the means so no data are obscured. Data for this figure were extracted from all 306 GPe units that passed the inclusion criteria.
Fig. S2.
Fig. S2.
GPe units correlate with learning and automaticity more than is to be expected by chance. Histograms of the portion of explained variance (R2) for single GPe units. Only units that included data for at least one repetition per trial position (i.e., to have sufficient data for correlation calculation) were included in the analysis. For each unit, we correlated the discharge rates across trials with the learning curves to obtain the values displayed in the upper panel. (Lower) A histogram of the portion of variance explained by linearly fitting the discharge rates of consecutive correct trials. In both cases, the discharge rate for the last second before stimulus choice (SC) was considered. Both histograms were normalized and presented in a logarithmic scale. The red bars signify significant correlations (P < 0.05). The green line shows the correlation histogram obtained after random shuffling.
Fig. S3.
Fig. S3.
The correlation of GPe units with learning and automaticity do not depend on recording location. The portion of the behavioral variance explained (R2) by GPe discharge rates were estimated across trials for each unit. (Upper) The correlation to the learning slopes. (Lower) The portion of the variance explained by linearly fitting the discharge rates of consecutive correct trials. Only units that included data for at least one repetition per trial position (i.e., to have sufficient data for correlation calculation) were included in the analysis. The 2D stereotactic position for each recorded unit was matched with its correlation metric to produce a topographical correlation map for each monkey. The upper panels depict the R2 values for the correlation with learning curves and the lower two depict the values for the linear fit over consecutive correct trials. Our data regarding unit location should be considered carefully, because unlike imaging techniques in vivo electrophysiology does not provide exact unit location. A-P, anterior–posterior axis; M-L, medial–lateral axis. N is the number of units.
Fig. 5.
Fig. 5.
Unlike GPe units, TANs do not predict switching throughout most of the trial, do not correlate to learning, and do not linearly decrease in repeatedly correct sequences. (A) We conducted an ANOVA to compare the effect size between both cell types. We considered the portions of variance (η2; log scale) explained by changes in TAN FRs (n = 10,506–10,561 trials) and absolute changes in GPe FRs (n = 25,722–26,993 trials) for encoding of previous outcome (white) and future switching (gray). Dashed/dotted lines represent significance threshold for P = 0.05 and P = 0.01, respectively. (B) Comparison between the variance explained using linear regression (R2) between observed learning slopes and either TAN or GPe discharge rates (FRs) for rewarded trials during the last second before the screen was touched (SC, stimulus chosen). (C) Comparison between the variance explained (R2) by fitting a linear function to either TAN or GPe FR during the last second before SC in sequences of 10 repeatedly correct trials. n.s, P > 0.05, ***P < 0.001. Data for this figure were extracted from all 306 GPe units and all 79 TANs that passed the inclusion criteria.
Fig. S4.
Fig. S4.
TAN discharge rates do not consistently predict switching, do not correlate to learning slopes, and do not linearly decrease over consecutively correct trials. (A) Z-scores of TAN discharge rates during different time epochs throughout the trial. The decrease in discharge rates, observed in the PSTH following the OR of both Tn and Tn+1, is typical of TANs (36, 40). Designations follow those of Fig. 2 (samples sizes were: U⇒R: n = 1,697–1,711; U⇒U: n = 1,306–1,312; U⇏U: n = 1,638–1,652; R⇒U: n = 1,371–1,376; R⇏R: n = 4,494–4,510). (B) TAN FR for rewarded trials following reversal (Left) and for sequences of rewarded choices (Right). Designations follow those of Figs. 3B and 4B, respectively. Data for this figure were extracted from all 79 TANS that passed the inclusion criteria. n.s, P > 0.05, *P < 0.05, **P < 0.01, ***P < 0.001.
Fig. S5.
Fig. S5.
Results obtained for each monkey individually show the same patterns as the means. All panels display analyses that were presented in previous figures. The left and right columns show the results for monkey V and C, respectively. From top to bottom, the displayed panels are identical to the top 3 panels of Fig. 2D and the bottommost panels of Figs. 2B, 3B, and 4B. Data for this figure were extracted from all 306 GPe (188 for monkey V and 118 for monkey C) units that passed the inclusion criteria. n.s, P > 0.05, *P < 0.05, **P < 0.01, ***P < 0.001.
Fig. S6.
Fig. S6.
Task performance measures hardly varied over recording sessions. Different behavioral measures of task performance were averaged per recording session (i.e., for all blocks performed in a single day). Each diamond-shaped point signifies a single experimental session. These sessions followed a long period of extensive training and began only after behavior reached plateau. Therefore, no gradual change in the behavioral measures was observed over recording sessions. The x axis is normalized to allow comparison between both monkeys despite the difference in the number of recording sessions between the two.

References

    1. Smith KS, Graybiel AM. Habit formation. Dialogues Clin Neurosci. 2016;18(1):33–43. - PMC - PubMed
    1. Houk JC, Adams JL, Barto AG. 1995. A model of how the basal ganglia generate and use neural signals that predict reinforcement. Models of Information Processing in the Basal Ganglia, Computational Neuroscience, eds Houk JC, Davis JL, Beiser DG (MIT Press, Cambridge, MA), pp 249–270.
    1. Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997;275(5306):1593–1599. - PubMed
    1. Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H. Midbrain dopamine neurons encode decisions for future action. Nat Neurosci. 2006;9(8):1057–1063. - PubMed
    1. Parker NF, et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat Neurosci. 2016;19(6):845–854. - PMC - PubMed

Publication types