Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 27;33(13):5797-805.
doi: 10.1523/JNEUROSCI.5445-12.2013.

Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia

Affiliations

Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia

Carlos Diuk et al. J Neurosci. .

Abstract

Studies suggest that dopaminergic neurons report a unitary, global reward prediction error signal. However, learning in complex real-life tasks, in particular tasks that show hierarchical structure, requires multiple prediction errors that may coincide in time. We used functional neuroimaging to measure prediction error signals in humans performing such a hierarchical task involving simultaneous, uncorrelated prediction errors. Analysis of signals in a priori anatomical regions of interest in the ventral striatum and the ventral tegmental area indeed evidenced two simultaneous, but separable, prediction error signals corresponding to the two levels of hierarchy in the task. This result suggests that suitably designed tasks may reveal a more intricate pattern of firing in dopaminergic neurons. Moreover, the need for downstream separation of these signals implies possible limitations on the number of different task levels that we can learn about simultaneously.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Sample trial: the participant chooses to play in the left casino, the door opens and displays a target number of points (indicated by red bar). After 2.5–3.5 s, the four slot machines appear. The participant plays upper-left slot and, after another 2.5–3.5 s, the points obtained in that machine are shown inside the machine (as a green bar plus a roman numeral). The corresponding part of the target points turns yellow, indicating the points accumulated with the first slot machine play. The rest is still red, indicating the points still necessary to win the casino. The participant plays the bottom-right slot machine and obtains sufficient points to win the casino. The target bar turns green and a message appears indicating the casino win (10¢).
Figure 2.
Figure 2.
HRL representation of the casino task. The top level shows the task of playing a casino, and the bottom level decomposes this task into the subtasks of playing slot machines. Prediction errors under the Outcome Model and Slot-Points Model are shown (in this example, “slot-3” indicates the name of the slot machine just played). Note that the prediction error for playing the left casino and the second slot machine occur simultaneously.
Figure 3.
Figure 3.
Logistic regression on casino (left) and slot machine (right) choices. We estimated the relationship between casino choices and the outcome of the casino on the last four times it was chosen, as well as the total slot machine points obtained in the corresponding trials. We similarly estimated the relationship between choices of each slot machine and the outcomes of the last four plays of this slot machine, as well as the casino outcomes during those same trials. Plotted are the regression weights for the last four outcomes of each type. Stars indicate significance at a between-participants Bonferroni-corrected level (p < 0.0063).
Figure 4.
Figure 4.
Average posterior probability per choice trial for the Outcome Model and the Target Model per participant. The Outcome Model assigns a higher average probability per trial to the choices of 22 of 28 participants (points lying above the solid equal-likelihood line). The average probability of a choice trial was calculated as the likelihood of the whole sequence of choice data divided by the number of choice trials. Dashed lines indicate chance.
Figure 5.
Figure 5.
Average posterior probability per choice trial for the Slot-Points Model and Six-Armed-Bandit Model per participant. The Slot-Points Model assigns a higher average probability per trial to the choices of 25 of 28 participants (points lying above the solid equal-likelihood line). Chance = 0.16, indicated by dashed lines.
Figure 6.
Figure 6.
A, Activations that survived a whole-brain FWE-corrected threshold of p < 0.05, cluster size >5, in the random effects contrast for the Casino regressor. Images are centered at voxel (18,14,−8) to better depict the extent of the activation. B, Activations that survived an uncorrected threshold of p < 0.001, cluster size >5, in the random effects contrast for the combined Slot regressor. Images are centered at voxel (−9, 11, −5).

References

    1. Abler B, Walter H, Erk S, Kammerer H, Spitzer M. Prediction error as a linear function of reward probability is coded in human nucleus accumbens. Neuroimage. 2006;31:790–795. - PubMed
    1. Barto AG. Adaptive critics and the basal ganglia. In: Houk JC, Davis J, Beiser D, editors. Models of information processing in the basal ganglia. Cambridge, MA: MIT; 1995. pp. 215–232.
    1. Barto AG, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems. 2003;13:341–379.
    1. Behrens TE, Hunt LT, Woolrich MW, Rushworth MF, S Associative learning of social value. Nature. 2008;456:245–249. - PMC - PubMed
    1. Botvinick MM. Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol. 2012;22:956–962. - PubMed

Publication types

LinkOut - more resources