Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Jun;15(2):435-59.
doi: 10.3758/s13415-015-0338-7.

Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis

Affiliations
Review

Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis

Henry W Chase et al. Cogn Affect Behav Neurosci. 2015 Jun.

Abstract

Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments-prediction error-is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies have suggested that mesostriatal prediction error signals control behavior through synaptic modification of cortico-striato-thalamic networks. Signals in the ventromedial prefrontal and orbitofrontal cortex are implicated in representing expected value. To obtain unbiased maps of these representations in the human brain, we performed a meta-analysis of functional magnetic resonance imaging studies that had employed algorithmic reinforcement learning models across a variety of experimental paradigms. We found that the ventral striatum (medial and lateral) and midbrain/thalamus represented reward prediction errors, consistent with animal studies. Prediction error signals were also seen in the frontal operculum/insula, particularly for social rewards. In Pavlovian studies, striatal prediction error signals extended into the amygdala, whereas instrumental tasks engaged the caudate. Prediction error maps were sensitive to the model-fitting procedure (fixed or individually estimated) and to the extent of spatial smoothing. A correlate of expected value was found in a posterior region of the ventromedial prefrontal cortex, caudal and medial to the orbitofrontal regions identified in animal studies. These findings highlight a reproducible motif of reinforcement learning in the cortico-striatal loops and identify methodological dimensions that may influence the reproducibility of activation patterns across studies.

PubMed Disclaimer

Conflict of interest statement

The authors declare no financial conflicts of interest that may have biased the present work.

Figures

Figure 1
Figure 1
The temporal difference (TD) model describes a real-time course of reward prediction error (PE) signals; PEs transfer from the US to the CS as learning progresses. In contrast, trial-level models such as Rescorla-Wagner describe PE only at the US. Associative strength (conceptually close to value) signals build at the CS. It is easy to see the resemblance between TD error signal and the combination of PE and associative strength signals in trial-level models. * Before the asymptote is reached. At the asymptote, PE at the US disappears.
Figure 2
Figure 2
Pie charts show the percentage of studies in each condition that were included in producing the RPE ALE map.
Figure 3
Figure 3
Map of significant ALE clusters associated with the RPE contrast, with the activations in the striatum highlighted. Pie charts show the contribution of studies of a particular class to the bilateral striatum activation. Percentages are not corrected for base rate.
Figure 4
Figure 4
Map of significant ALE clusters associated with the RPE contrast, with the activations in the midbrain and frontal operculum highlighted. Pie charts show the contribution of studies of a particular class to each activation. Percentages are not corrected for base rate.
Figure 5
Figure 5
Conjunction map showing the overlap of ALE maps from individual subgroup analyses (Fixed, Individual, Pavlovian, Instrumental, Outcome PE, TD, Monetary, Liquid and Social), with the left putamen cluster (x=-22, y=6, z=9, cluster size = 30) from the conjunction analysis shown in green and marked with arrows.
Figure 6
Figure 6
Map of significant ALE clusters associated with the EV contrast. Pie charts show the contribution of studies of a particular class to the subgenual cingulate activation. Percentages are not corrected for base rate.

Similar articles

Cited by

References

    1. Arsenault JT, Nelissen K, Jarraya B, Vanduffel W. Dopaminergic reward signals selectively decrease fMRI activity in primate visual cortex. Neuron. 2013;77(6):1174–1186. doi: 10.1016/j.neuron.2013.01.008. - DOI - PMC - PubMed
    1. Balleine BW, Dickinson A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology. 1998;37(4–5):407–419. - PubMed
    1. Bartra O, McGuire JT, Kable JW. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage. 2013;76:412–427. doi: 10.1016/j.neuroimage.2013.02.063. - DOI - PMC - PubMed
    1. Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nat Neurosci. 2007;10(9):1214–1221. http://www.nature.com/neuro/journal/v10/n9/suppinfo/nn1954_S1.html. - PubMed
    1. Bellebaum C, Jokisch D, Gizewski ER, Forsting M, Daum I. The neural coding of expected and unexpected monetary performance outcomes: dissociations between active and observational learning. Behav Brain Res. 2012;227(1):241–251. doi: 10.1016/j.bbr.2011.10.042. - DOI - PubMed

LinkOut - more resources