Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 6;7(1):4762.
doi: 10.1038/s41598-017-04507-w.

Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

Affiliations

Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

Elsa Fouragnan et al. Sci Rep. .

Abstract

Reward learning depends on accurate reward associations with potential choices. These associations can be attained with reinforcement learning mechanisms using a reward prediction error (RPE) signal (the difference between actual and expected rewards) for updating future reward expectations. Despite an extensive body of literature on the influence of RPE on learning, little has been done to investigate the potentially separate contributions of RPE valence (positive or negative) and surprise (absolute degree of deviation from expectations). Here, we coupled single-trial electroencephalography with simultaneously acquired fMRI, during a probabilistic reversal-learning task, to offer evidence of temporally overlapping but largely distinct spatial representations of RPE valence and surprise. Electrophysiological variability in RPE valence correlated with activity in regions of the human reward network promoting approach or avoidance learning. Electrophysiological variability in RPE surprise correlated primarily with activity in regions of the human attentional network controlling the speed of learning. Crucially, despite the largely separate spatial extend of these representations our EEG-informed fMRI approach uniquely revealed a linear superposition of the two RPE components in a smaller network encompassing visuo-mnemonic and reward areas. Activity in this network was further predictive of stimulus value updating indicating a comparable contribution of both signals to reward learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Schematic representation of the experimental task and modeling of behavioral responses. (a) Each trial began with a random delay followed by the presentation of two abstract symbols (selected from a larger set of three symbols) for a period of 1.25 s. During this time, subjects pressed one of two buttons on a response device to indicate which of the two symbols (right or left) they believed was more likely to lead to a reward. The fixation cross flickered for 100 ms when a selection was made. Finally the decision outcome was revealed after a second random delay. A tick or a cross were used to inform the participants of a positive or a negative outcome, respectively. (b) Main panel: Model-predicted choice probabilities (x-axis) derived from a RL algorithm using a softmax procedure (binned into ten bins – bin size of 0.1 - and averaged across all subjects and across symbols) closely matched participants observed behavioral choices (y-axis), calculated for each bin as the fraction of trials in which they chose one of the three symbols. Small panel: A Bayesian model comparison using BIC scores revealed that the Model-free (Mf) RL model explained the data better than a Model-based (Mb) RL model and a model including a dynamic learning rate (Dy)  (Lower BIC scores indicate a better fit – see Methods). (c) A mixed-effects regression analysis demonstrated that participants were more likely to repeat the same choice after positive compared to negative feedback with a significant interaction between RPE valence and surprise (see text for details). (d) A mixed-effects regression analysis revealed that positive and negative RPEs speed up and slow down reaction times on subsequent trials (Delta RTs = RTs(t + 1) − RTs(t)) respectively, with a significant interaction between RPE valence and surprise (see text for details).
Figure 2
Figure 2
Single-trial discriminant analysis and EEG-informed fMRI regressors. (a) We used single-trial analysis of the EEG to perform binary discriminations between conditions of interest, here denoted as condition A and B (in red and blue respectively). We first estimated w, which is a linear weighting on the EEG sensor data (X) that maximally discriminates between the two conditions. This determines a task-related projection (y) of the data, in which the distance to the decision boundary reflects the decision certainty of the classifier in separating each of the relevant conditions. We treated the single-trial y amplitudes (single-trial variability [EEG STV]), as an index of how each condition of interest was encoded on individual trials. (b) Given these y values and their corresponding outcome-locked onset time points, we built fMRI regressors for subsequent GLM analyses. These regressors were all convolved with the canonical HRF. Details of specific events included in each EEG-informed fMRI regressor can be found in the main text (see fMRI analysis section).
Figure 3
Figure 3
Single-trial EEG analyses. (a) Discriminator performance (cross-validated A z) during RPE valence discrimination (positive vs. negative feedback) of outcome-locked EEG responses, averaged across subjects (N = 20). The dotted line represents the average A z value leading to a significance level of P = 0.01, estimated using a bootstrap test. Shaded error bars are standard errors across subjects. In this work, we are focusing only on the second of the two RPE valence components (Late Valence component). The scalp map represents the spatial topography of this component. (b) Mean discriminator output (y) for the Late valence component, binned in ten quantiles based on model-based signed RPE estimates. This component exhibits mainly a categorical response profile without an additional modulation by the magnitude of the RPE. (c) Discriminator performance (cross-validated A z) during RPE surprise discrimination (very low vs. very high surprising outcomes), averaged across subjects (N = 20). The dotted line represents the average A z value leading to a significance level of P = 0.01, estimated using a bootstrap test. Shaded error bars are standard errors across subjects. The scalp map represents the spatial topography of the outcome surprise component. (d) Mean discriminator output (y) for the outcome surprise component, binned in five quantiles based on model-based unsigned RPE estimates, showing a parametric response along the outcome surprise dimension. Yellow bins indicate trials used to train the classifier, while grey bins contain “unseen” data with intermediate outcome surprise levels. Error bars are standard errors across subjects.
Figure 4
Figure 4
Spatiotemporal characterization of RPE valence and surprise signals. (a) Regions correlating with the EEG STV in our valence component, exhibiting overall greater response for positive compared to negative RPEs. (b) Regions correlating positively with outcome surprise as captured by a RL model (green) and the STV in our corresponding EEG component (yellow), respectively. Note the complementary nature of activations in the EEG STV map. All activations represent mixed-effects and are rendered on the standard MNI brain at Z > 2.57, cluster-corrected (P < 0.05) using a resampling procedure (minimum cluster size = 76 voxels).
Figure 5
Figure 5
Full spatial representation of the RPE valence and surprise networks and their overlap. (a) A conjunction analysis on the results arising from the EEG-informed regressors for the RPE valence (red clusters) and surprise (green clusters) components revealed that four areas – the STR, vmPFC, LG and MTG – significantly encoded both quantities (brown clusters). The conjunction analysis was performed using the resulting whole brain activation maps for RPE valence and surprise and applying a Z > 2.57, cluster-corrected (P < 0.05) using a resampling procedure. (b) The four overlapping regions exhibited a clear linear superposition profile between the two RPE components with a higher BOLD signal for positive vs. negative RPEs but also a systematic increase from low (L), to medium (M), to high (H) outcome surprise trials, within each outcome type.

Similar articles

Cited by

References

    1. Sutton, R. Reinforcement Learning: An Introduction. (MIT Press, 1998).
    1. Schultz W, Dayan P, Montague PR. A Neural Substrate of Prediction and Reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. - DOI - PubMed
    1. Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902. doi: 10.1126/science.1077349. - DOI - PubMed
    1. Chau BKH, et al. Contrasting Roles for Orbitofrontal Cortex and Amygdala in Credit Assignment and Learning in Macaques. Neuron. 2015;87:1106–1118. doi: 10.1016/j.neuron.2015.08.018. - DOI - PMC - PubMed
    1. Niv Y, et al. Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms. J. Neurosci. 2015;35:8145–8157. doi: 10.1523/JNEUROSCI.2978-14.2015. - DOI - PMC - PubMed

Publication types

LinkOut - more resources