Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar;22(3):527-36.
doi: 10.1093/cercor/bhr117. Epub 2011 Jun 21.

Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI

Affiliations

Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI

David Badre et al. Cereb Cortex. 2012 Mar.

Abstract

The frontal lobes may be organized hierarchically such that more rostral frontal regions modulate cognitive control operations in caudal regions. In our companion paper (Frank MJ, Badre D. 2011. Mechanisms of hierarchical reinforcement learning in corticostriatal circuits I: computational analysis. 22:509-526), we provide novel neural circuit and algorithmic models of hierarchical cognitive control in cortico-striatal circuits. Here, we test key model predictions using functional magnetic resonance imaging (fMRI). Our neural circuit model proposes that contextual representations in rostral frontal cortex influence the striatal gating of contextual representations in caudal frontal cortex. Reinforcement learning operates at each level, such that the system adaptively learns to gate higher order contextual information into rostral regions. Our algorithmic Bayesian "mixture of experts" model captures the key computations of this neural model and provides trial-by-trial estimates of the learner's latent hypothesis states. In the present paper, we used these quantitative estimates to reanalyze fMRI data from a hierarchical reinforcement learning task reported in Badre D, Kayser AS, D'Esposito M. 2010. Frontal cortex and the discovery of abstract action rules. Neuron. 66:315--326. Results validate key predictions of the models and provide evidence for an individual cortico-striatal circuit for reinforcement learning of hierarchical structure at a specific level of policy abstraction. These findings are initially consistent with the proposal that hierarchical control in frontal cortex may emerge from interactions among nested cortico-striatal circuits at different levels of abstraction.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic of hierarchical cortico–striatal circuit. In the standard response selection circuit, motor areas of the striatum interact with motor cortex to facilitate response selection based on the learned probability of reward given the current stimulus state. The PMd-maint layer represents possible stimuli to be actively maintained so as to constrain motor selection processes. Its corresponding striatal region learns which stimulus dimensions should be gated into PMd based on the learned probability that their maintenance is predictive of reward. The PMd-out layer represents the deep lamina (e.g., layers 5/6) of PMd in which only a subset of currently maintained PMd stimuli influences response selection, by projecting to the motor striatum. Its corresponding striatal area learns which of the maintained PMd stimuli should be output gated depending on context. The most anterior prePMd layer maintains stimulus features that act as context, by sending their axons to striatal output gating areas of PMd. Its corresponding striatal gating layer learns whether the maintenance of particular stimuli as higher order context in prePMd is predictive of reward.
Figure 2.
Figure 2.
Schematics of the hierarchical learning task from Badre et al. (2010). (a) Depiction of trial events during both learning epochs. On each trial, the participant is presented with a shape, at a particular orientation, surrounded by a colored box. They then choose 1 of 3 responses on the keypad depending on these stimulus features. This is followed by feedback indicating whether the response was correct or not. Feedback was separated from stimulus onset to permit separate event-related analysis of these 2 phases. (b) Policy structure for the flat condition. In the flat condition, 18 unique mappings had to be learned between each conjunction of shape, orientation, and color and a response, yielding a wide flat first-order structure. (c) Policy structure for the hierarchical condition. If they learned the contingent relationship between color and orientation versus shape (second-order policy), participants could select a subset of shape- or orientation-based rules depending on color.
Figure 3.
Figure 3.
Individual differences analyses based on attentional weight to hierarchical (vs. flat) expert (wH) shows differential activation in prePMd ROI (shown at left). (Top) The time course of BOLD response in prePMd ROI is plotted, showing significantly greater response in subjects with high attentional weight to hierarchical expert (wH). (Bottom) Scatter plot of peak activation in prePMd (x-axis) against wH (y-axis) with best fit trendline. Activation in prePMd and wH are reliably correlated.
Figure 4.
Figure 4.
Model-based RPEs and frontostriatal activity. (a) BOLD response to brain areas that track RPE. Activations are observed in striatum and lateral frontal cortex. Note that for illustrative purposes, activations are plotted at an uncorrected threshold of P < 0.001. (b) Functionally defined ROI’s for PMd, prePMd, and areas within caudate posterior to, at the same level as, and anterior to, prePMd. (c) Within cortical ROI’s, prePMd tracks RPE specifically when model-derived attentional weight to hierarchical rule (RPEHmod ), but not flat rule (RPEFmod), is high. PMd doesn’t distinguish between hierarchical and flat rules in its sensitivity to RPE. (d) Within caudate, areas at the same anterior-to-posterior level as prePMd track RPE modulated by attention to hierarchical relative to flat rule. Caudate areas more posterior and more anterior to prePMd are not sensitive to this distinction.
Figure 5.
Figure 5.
Between subjects, the BOLD response to RPE modulated by attention to the hierarchical rule (RPEHmod = RPE × wOS|C) is predictive of the decline of prePMd activity in the flat condition in (a) left and marginally in (b) right caudate.

References

    1. Badre D. Cognitive control, hierarchy, and the rostro-caudal organization of the frontal lobes. Trends Cogn Sci. 2008;12:193–200. - PubMed
    1. Badre D, D'Esposito M. Functional magnetic resonance imaging evidence for a hierarchical organization of the prefrontal cortex. J Cogn Neurosci. 2007;19:2082–2099. - PubMed
    1. Badre D, D'Esposito M. Is the rostro-caudal axis of the frontal lobe hierarchical? Nat Rev Neurosci. 2009;10:659–669. - PMC - PubMed
    1. Badre D, Hoffman J, Cooney JW, D'Esposito M. Hierarchical cognitive control deficits following damage to the human frontal lobe. Nat Neurosci. 2009;12:515–522. - PMC - PubMed
    1. Badre D, Kayser AS, D'Esposito M. Frontal cortex and the discovery of abstract action rules. Neuron. 2010;66:315–326. - PMC - PubMed

Publication types