Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 1;28(11):3965-3975.
doi: 10.1093/cercor/bhx259.

Beyond Reward Prediction Errors: Human Striatum Updates Rule Values During Learning

Affiliations

Beyond Reward Prediction Errors: Human Striatum Updates Rule Values During Learning

Ian Ballard et al. Cereb Cortex. .

Abstract

Humans naturally group the world into coherent categories defined by membership rules. Rules can be learned implicitly by building stimulus-response associations using reinforcement learning or by using explicit reasoning. We tested if the striatum, in which activation reliably scales with reward prediction error, would track prediction errors in a task that required explicit rule generation. Using functional magnetic resonance imaging during a categorization task, we show that striatal responses to feedback scale with a "surprise" signal derived from a Bayesian rule-learning model and are inconsistent with RL prediction error. We also find that striatum and caudal inferior frontal sulcus (cIFS) are involved in updating the likelihood of discriminative rules. We conclude that the striatum, in cooperation with the cIFS, is involved in updating the values assigned to categorization rules when people learn using explicit reasoning.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Rule learning task and behavior. (a) Participants completed six 20-trial blocks of a rule-learning task. Trials were divided into three phases: cue, response and feedback, each separated by a random 4–6 s delay. During the cue phase (2 s), the stimulus to be categorized was presented in the center of the screen. During the response phase (2 s), a question mark was presented in the center of the screen, prompting participants to press a button to respond. During the feedback phase (2 s), a message was displayed indicating whether the response was correct. (b) Average reaction times for each of the rule blocks, ordered by mean reaction time. Although there was heterogeneity in reaction time between rules, only the difference between A and (A and B) was significant when correcting for multiple comparisons. (c) Left panels show mean participant accuracy and Bayesian rule learning model predictions, without any parameter fitting, for each rule. To the right is the average performance, collapsed across trials, referenced against the performance of an optimal version of the Bayesian model for each rule. The dotted line represents chance performance. Participants learned to respond well-above chance and remarkably close to optimal performance for all rules. All error bars represent bootstrapped estimates of the standard error of the mean across subjects.
Figure 2.
Figure 2.
Bayesian rule learning outperforms RL models. (a) Bayesian Rule Learning outperforms standard RL models that learn about either stimuli (Naïve) or features (Feature) in predicting subject behavior. The model also outperforms an RL model with an exhaustive state space (features, stimuli, and pairwise combinations of features) as well as an augmented version of this model with a Pierce-Hall update. Finally, the Bayesian model outperforms a mixture model that combines both Bayesian and RL predictions. (b) Rule-by-rule comparison of the predictive accuracy of the Bayesian model and the best performing, exhaustive Pearce-Hall RL model. Despite significant heterogeneity, the Bayesian model outperforms the RL model for most rules, whereas the RL model does not significantly outperform the Bayesian model on any rule.
Figure 3.
Figure 3.
Striatum represents Bayesian surprise, not RL prediction error. (a) Mean prediction error from the best-fitting RL model, sorted by whether outcome was positive or negative. (b) Mean surprise from Bayesian rule learning model, sorted by whether outcome was positive or negative. (c) Whole-brain corrected results for the contrast of positive > negative outcomes. There were no significant voxels in the striatum for this contrast. (d) Whole-brain corrected results for the contrast of negative > positive outcomes. (e) Results of a conjunction analysis displaying voxels that are significantly active for both negative > positive outcomes and the parametric effect of surprise. Both contrasts were corrected for multiple comparisons across the whole-brain before being entered into the conjunction analysis. (f) Whole-brain corrected results for the contrast of parametric surprise > parametric prediction error, without the effect of outcome partialed out.
Figure 4.
Figure 4.
Analysis of feedback response from a caudate ROI defined based on its connectivity to executive cortical areas. (a) Responses to negative outcomes were greater than to positive outcomes. (b) A monotonic relationship existed between Bayesian surprise and response amplitude, with greater response for highest surprise. (c) By contrast, no monotonic relationship was evident between striatal response and RPE. (d) Striatal activity is plotted as a function of surprise for each rule learned in the task. Despite some heterogeneity, the striatum generally increases its response as a function of surprise across rules. Error bars represent bootstrapped estimates of the standard error of the mean.
Figure 5.
Figure 5.
Rule updating. (a) Rule updating during the feedback period in the striatum and left cIFS. (b) Rule updating during the subsequent cue period in the left cIFS. (c) Projections of a and c onto the cortical surface. Red corresponds to rule updating during the feedback period, blue corresponds to rule updating during the subsequent cue period.

References

    1. Badre D, D’Esposito M. 2007. Functional magnetic resonance imaging evidence for a hierarchical organization of the prefrontal cortex. J Cogn Neurosci. 19:2082–2099. - PubMed
    1. Badre D, D’Esposito M. 2009. Is the rostro-caudal axis of the frontal lobe hierarchical? Nat Rev Neurosci. 10:659–669. - PMC - PubMed
    1. Buschman TJ, Denovellis EL, Diogo C, Bullock D, Miller EK. 2012. Synchronous oscillatory neural ensembles for rules in the prefrontal cortex. Neuron. 76:838–846. - PMC - PubMed
    1. Costa VD, Tran VL, Turchi J, Averbeck BB. 2015. Reversal learning and dopamine: a bayesian perspective. J Neurosci. 35:2407–2416. - PMC - PubMed
    1. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. 2011. Model-based influences on humans’ choices and striatal prediction errors. Neuron. 69:1204–1215. - PMC - PubMed

Publication types