Beta Oscillations in Monkey Striatum Encode Reward Prediction Error Signals

Ruggero Basanisi¹, Kevin Marche^{2

3}, Etienne Combrisson², Paul Apicella², Andrea Brovelli¹

Affiliations

¹ Institut de Neurosciences de la Timone, Aix Marseille Université, Unité Mixte de Recherche 7289 Centre National de la Recherche Scientifique, Marseille 13005, France andrea.brovelli@univ-amu.fr ruggero.basanisi@gmail.com.
² Institut de Neurosciences de la Timone, Aix Marseille Université, Unité Mixte de Recherche 7289 Centre National de la Recherche Scientifique, Marseille 13005, France.
³ Wellcome Center for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford OX3 9DU, United Kingdom.

PMID: 37015808
PMCID: PMC10162459
DOI: 10.1523/JNEUROSCI.0952-22.2023

Beta Oscillations in Monkey Striatum Encode Reward Prediction Error Signals

Ruggero Basanisi et al. J Neurosci. 2023.

. 2023 May 3;43(18):3339-3352.

doi: 10.1523/JNEUROSCI.0952-22.2023. Epub 2023 Apr 4.

Authors

Ruggero Basanisi¹, Kevin Marche^{2

3}, Etienne Combrisson², Paul Apicella², Andrea Brovelli¹

Affiliations

¹ Institut de Neurosciences de la Timone, Aix Marseille Université, Unité Mixte de Recherche 7289 Centre National de la Recherche Scientifique, Marseille 13005, France andrea.brovelli@univ-amu.fr ruggero.basanisi@gmail.com.
² Institut de Neurosciences de la Timone, Aix Marseille Université, Unité Mixte de Recherche 7289 Centre National de la Recherche Scientifique, Marseille 13005, France.
³ Wellcome Center for Integrative Neuroimaging, Department of Experimental Psychology, University of Oxford, Oxford OX3 9DU, United Kingdom.

PMID: 37015808
PMCID: PMC10162459
DOI: 10.1523/JNEUROSCI.0952-22.2023

Abstract

Reward prediction error (RPE) signals are crucial for reinforcement learning and decision-making as they quantify the mismatch between predicted and obtained rewards. RPE signals are encoded in the neural activity of multiple brain areas, such as midbrain dopaminergic neurons, prefrontal cortex, and striatum. However, it remains unclear how these signals are expressed through anatomically and functionally distinct subregions of the striatum. In the current study, we examined to which extent RPE signals are represented across different striatal regions. To do so, we recorded local field potentials (LFPs) in sensorimotor, associative, and limbic striatal territories of two male rhesus monkeys performing a free-choice probabilistic learning task. The trial-by-trial evolution of RPE during task performance was estimated using a reinforcement learning model fitted on monkeys' choice behavior. Overall, we found that changes in beta band oscillations (15-35 Hz), after the outcome of the animal's choice, are consistent with RPE encoding. Moreover, we provide evidence that the signals related to RPE are more strongly represented in the ventral (limbic) than dorsal (sensorimotor and associative) part of the striatum. To conclude, our results suggest a relationship between striatal beta oscillations and the evaluation of outcomes based on RPE signals and highlight a major contribution of the ventral striatum to the updating of learning processes.SIGNIFICANCE STATEMENT Reward prediction error (RPE) signals are crucial for reinforcement learning and decision-making as they quantify the mismatch between predicted and obtained rewards. Current models suggest that RPE signals are encoded in the neural activity of multiple brain areas, including the midbrain dopaminergic neurons, prefrontal cortex and striatum. However, it remains elusive whether RPEs recruit anatomically and functionally distinct subregions of the striatum. Our study provides evidence that RPE-related modulations in local field potential (LFP) power are dominant in the striatum. In particular, they are stronger in the rostro-ventral rather than the caudo-dorsal striatum. Our findings contribute to a better understanding of the role of striatal territories in reward-based learning and may be relevant for neuropsychiatric and neurologic diseases that affect striatal circuits.

Keywords: LFP; learning; mutual information; rewards; striatum.

PubMed Disclaimer

Figures

**Figure 1.**
Probabilistic learning task and choice performance. A, Sequence of events inside a single trial. Each trial started with the monkey holding its hand on a metal bar. After a first visual stimulus (“cue,” green LEDs on) lasting 0.5 s, a second visual stimulus (“go signal,” red LEDs on) was presented after 1-s delay to instruct the monkeys to perform a reaching movement to one of the three targets. After a variable delay depending on the reaction time (RT) and the movement time (MT) of the monkeys, on target contact, the go signal was turned off and the monkey immediately received an outcome (reward or not). Correlates of the RPE signals were examined in an 800 ms period (orange-shaded area) after outcome release. B, Histograms representing the distribution of the motor response times (composed of RT + MT) relative to monkey F and monkey T for both Easy and Hard conditions. Vertical lines represent the mean of the distributions. C, Experimental setup. Monkeys sat in a box with two openings, one for the head and one for their right arm, in front of three target buttons with LEDs, that could be reached with their right hand. An equally reachable metal bar placed under the middle button was used as the starting position of a trial. D, Evolution of RPE as a function of correct trials. Correct trials are considered as the trials in which the monkeys chose to press the most rewarding button. Data were pooled across blocks for each schedule (“Easy,” “Hard”) and each monkey. The solid lines and shaded areas correspond to the mean ± SEM of RPEs computed by the Q-learning model. E, Choice performance computed from monkeys' behavior. Data were pooled across blocks for each schedule (“Easy,” “Hard”) and each monkey. The solid lines and shaded areas correspond to the mean ± SEM of the probability of choosing the most rewarding target as a function of trial number within a block. F, Choice performance computed from the Q-learning model. Data were pooled across blocks for each schedule (“Easy,” “Hard”) and each monkey. The solid lines and shaded areas correspond to the mean ± SEM of the probability of choosing the most rewarding target extracted trial by trial from the state-action transition matrix computed by the model.

**Figure 2.**
Outcome-related modulations in LFP power. A, Exemplar sessions depicting epoched LFPs (5 rewarded and 5 unrewarded trials each). LFPs were filtered between 1 and 140 Hz. Light red lines represent rewarded trials while light black lines represent unrewarded trials. The dashed vertical black line coincides with the target contact and outcome release, and is the temporal point on which data are aligned. The two plain red and black lines represents the corresponding averages. B, Time-frequency maps of monkey F (top) and monkey T (bottom), averaged across trials within both the task conditions, grouped by the outcome (No reward, Reward) and the subtraction between the two (Reward − No Reward). The time window from 0 to 0.8 s corresponds to the outcome period (orange-shaded area in Fig. 1A), selected to avoid the contamination from relevant movements (e.g., arm movements). C, The statistical analysis of LFP power modulations contrasting rewarded and unrewarded trials displayed significant effects in the beta band. The color code is in the -log10(p-values) scale.

**Figure 3.**
Relation between RPEs and beta power. A, Relation between the averaged normalized beta power over a time window of 0.2–0.8 s and average RPEs across trials in the limbic striatum of monkey F (top) and monkey T (bottom). Dots' color fading from yellow to blue represents the passage from early trials to late trials. The negative values of the averaged beta power are the results of the normalization over the baseline. B, Mutual information (MI) between beta band LFP power and RPE. The dashed vertical line represents the target contact time on which data are aligned. The dashed blue lines represent nonsignificant values (p ≥ 0.05) of MI, while the continuous ones represent significant values (p < 0.05). The chosen time window reflects the outcome period, with time 0 corresponding to the target contact and outcome delivery.

**Figure 4.**
Positions of all striatal recording sites in monkey F. Each dot corresponds to a single LFP recording site. Coronal sections are labeled in rostro-caudal stereotaxic planes according to distances from the anterior commissure (AC) used as a reference landmark. The inset shows a photomicrograph of a coronal section stained with cresyl violet at the level of the posterior putamen (i.e., sensorimotor striatum) with visible traces of electrode tracks (arrow) above the putamen. Cd, caudate nucleus; Put, putamen.

**Figure 5.**
Anatomo-functional distribution of RPE-related beta band LFP modulations. A, Two-dimensional spatial positions of recording sites clusters' centroids, for each monkey. Clusters are represented along their antero-posterior (AP) position and depth of the recording site. Digit numbers are labels that are used in panel B. Digit color corresponds to a striatal region (green for the motor striatum, blue for the associative and red for the limbic parts). B, MI computed in each of the clusters, separately for each monkey. Digit labels and digit colors are the same as in A. Dashed and continuous lines represent nonsignificant and significant values of Mutual Information (MI), respectively.

**Figure 6.**
Striatal gradient of the total RPE beta band MI. Each point represents a cluster of recording sites and colors associated with the three striatal territories (green for motor, blue for associative and red for limbic territories). The y-axis reflects the cumulative Mutual Information (MI) calculated over the outcome time interval. The x-axis reflects the clusters' center distance with respect to the rostro-ventral to caudo-dorsal reference. Such reference was computed by taking the AP coordinate of the most posterior recording site and the Depth coordinate of the higher recording site of each monkey.

See this image and copyright information in PMC

References

1. Abler B, Walter H, Erk S, Kammerer H, Spitzer M (2006) Prediction error as a linear function of reward probability is coded in human nucleus accumbens. Neuroimage 31:790–795. 10.1016/j.neuroimage.2006.01.001 - DOI - PubMed
1. Alberquilla S, Gonzalez-Granillo A, Martín ED, Moratalla R (2020) Dopamine regulates spine density in striatal projection neurons in a concentration-dependent manner. Neurobiol Dis 134:104666. 10.1016/j.nbd.2019.104666 - DOI - PubMed
1. Alexander GE, DeLong MR, Strick PL (1986) Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu Rev Neurosci 9:357–381. 10.1146/annurev.ne.09.030186.002041 - DOI - PubMed
1. Apicella P, Ljungberg T, Scarnati E, Schultz W (1991) Responses to reward in monkey dorsal and ventral striatum. Exp Brain Res 85:491–500. 10.1007/BF00231732 - DOI - PubMed
1. Apicella P, Deffains M, Ravel S, Legallet E (2009) Tonically active neurons in the striatum differentiate between delivery and omission of expected reward in a probabilistic task context. Eur J Neurosci 30:515–526. 10.1111/j.1460-9568.2009.06872.x - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Beta Oscillations in Monkey Striatum Encode Reward Prediction Error Signals

Affiliations

Beta Oscillations in Monkey Striatum Encode Reward Prediction Error Signals

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources