. 2017 Jul 6;7(1):4762.

doi: 10.1038/s41598-017-04507-w.

Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

Elsa Fouragnan^{1

2}, Filippo Queirazza³, Chris Retzler^{3

4}, Karen J Mullinger^{5

6}, Marios G Philiastides⁷

Affiliations

¹ Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, UK. elsa.fouragnan@psy.ox.ac.uk.
² Department of Experimental Psychology, University of Oxford, Oxford, UK. elsa.fouragnan@psy.ox.ac.uk.
³ Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, UK.
⁴ Department of Behavioural & Social Sciences, University of Huddersfield, Huddersfield, UK.
⁵ Sir Peter Mansfield Magnetic Resonance Center, School of Physics and Astronomy, University of Nottingham, Nottingham, UK.
⁶ Birmingham University Imaging Centre, School of Psychology, University of Birmingham, Birmingham, UK.
⁷ Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, UK. Marios.Philiastides@glasgow.ac.uk.

PMID: 28684734
PMCID: PMC5500565
DOI: 10.1038/s41598-017-04507-w

Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

Elsa Fouragnan et al. Sci Rep. 2017.

. 2017 Jul 6;7(1):4762.

doi: 10.1038/s41598-017-04507-w.

Authors

Elsa Fouragnan^{1

2}, Filippo Queirazza³, Chris Retzler^{3

4}, Karen J Mullinger^{5

6}, Marios G Philiastides⁷

Affiliations

¹ Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, UK. elsa.fouragnan@psy.ox.ac.uk.
² Department of Experimental Psychology, University of Oxford, Oxford, UK. elsa.fouragnan@psy.ox.ac.uk.
³ Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, UK.
⁴ Department of Behavioural & Social Sciences, University of Huddersfield, Huddersfield, UK.
⁵ Sir Peter Mansfield Magnetic Resonance Center, School of Physics and Astronomy, University of Nottingham, Nottingham, UK.
⁶ Birmingham University Imaging Centre, School of Psychology, University of Birmingham, Birmingham, UK.
⁷ Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, UK. Marios.Philiastides@glasgow.ac.uk.

PMID: 28684734
PMCID: PMC5500565
DOI: 10.1038/s41598-017-04507-w

Abstract

Reward learning depends on accurate reward associations with potential choices. These associations can be attained with reinforcement learning mechanisms using a reward prediction error (RPE) signal (the difference between actual and expected rewards) for updating future reward expectations. Despite an extensive body of literature on the influence of RPE on learning, little has been done to investigate the potentially separate contributions of RPE valence (positive or negative) and surprise (absolute degree of deviation from expectations). Here, we coupled single-trial electroencephalography with simultaneously acquired fMRI, during a probabilistic reversal-learning task, to offer evidence of temporally overlapping but largely distinct spatial representations of RPE valence and surprise. Electrophysiological variability in RPE valence correlated with activity in regions of the human reward network promoting approach or avoidance learning. Electrophysiological variability in RPE surprise correlated primarily with activity in regions of the human attentional network controlling the speed of learning. Crucially, despite the largely separate spatial extend of these representations our EEG-informed fMRI approach uniquely revealed a linear superposition of the two RPE components in a smaller network encompassing visuo-mnemonic and reward areas. Activity in this network was further predictive of stimulus value updating indicating a comparable contribution of both signals to reward learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1**
Schematic representation of the experimental task and modeling of behavioral responses. (a) Each trial began with a random delay followed by the presentation of two abstract symbols (selected from a larger set of three symbols) for a period of 1.25 s. During this time, subjects pressed one of two buttons on a response device to indicate which of the two symbols (right or left) they believed was more likely to lead to a reward. The fixation cross flickered for 100 ms when a selection was made. Finally the decision outcome was revealed after a second random delay. A tick or a cross were used to inform the participants of a positive or a negative outcome, respectively. (b) Main panel: Model-predicted choice probabilities (x-axis) derived from a RL algorithm using a softmax procedure (binned into ten bins – bin size of 0.1 - and averaged across all subjects and across symbols) closely matched participants observed behavioral choices (y-axis), calculated for each bin as the fraction of trials in which they chose one of the three symbols. Small panel: A Bayesian model comparison using BIC scores revealed that the Model-free (Mf) RL model explained the data better than a Model-based (Mb) RL model and a model including a dynamic learning rate (Dy) (Lower BIC scores indicate a better fit – see Methods). (c) A mixed-effects regression analysis demonstrated that participants were more likely to repeat the same choice after positive compared to negative feedback with a significant interaction between RPE valence and surprise (see text for details). (d) A mixed-effects regression analysis revealed that positive and negative RPEs speed up and slow down reaction times on subsequent trials (Delta RTs = RTs(t + 1) − RTs(t)) respectively, with a significant interaction between RPE valence and surprise (see text for details).

**Figure 2**
Single-trial discriminant analysis and EEG-informed fMRI regressors. (a) We used single-trial analysis of the EEG to perform binary discriminations between conditions of interest, here denoted as condition A and B (in red and blue respectively). We first estimated $w$ , which is a linear weighting on the EEG sensor data ( $X$ ) that maximally discriminates between the two conditions. This determines a task-related projection ( $y$ ) of the data, in which the distance to the decision boundary reflects the decision certainty of the classifier in separating each of the relevant conditions. We treated the single-trial y amplitudes (single-trial variability [EEG STV]), as an index of how each condition of interest was encoded on individual trials. (b) Given these y values and their corresponding outcome-locked onset time points, we built fMRI regressors for subsequent GLM analyses. These regressors were all convolved with the canonical HRF. Details of specific events included in each EEG-informed fMRI regressor can be found in the main text (see *fMRI analysis* section).

**Figure 3**
Single-trial EEG analyses. (a) Discriminator performance (cross-validated A _z) during RPE valence discrimination (positive vs. negative feedback) of outcome-locked EEG responses, averaged across subjects (N = 20). The dotted line represents the average A _z value leading to a significance level of P = 0.01, estimated using a bootstrap test. Shaded error bars are standard errors across subjects. In this work, we are focusing only on the second of the two RPE valence components (Late Valence component). The scalp map represents the spatial topography of this component. (b) Mean discriminator output ( $y$ ) for the Late valence component, binned in ten quantiles based on model-based signed RPE estimates. This component exhibits mainly a categorical response profile without an additional modulation by the magnitude of the RPE. (c) Discriminator performance (cross-validated A _z) during RPE surprise discrimination (very low vs. very high surprising outcomes), averaged across subjects (N = 20). The dotted line represents the average A _z value leading to a significance level of P = 0.01, estimated using a bootstrap test. Shaded error bars are standard errors across subjects. The scalp map represents the spatial topography of the outcome surprise component. **(d)** Mean discriminator output ( $y$ ) for the outcome surprise component, binned in five quantiles based on model-based unsigned RPE estimates, showing a parametric response along the outcome surprise dimension. Yellow bins indicate trials used to train the classifier, while grey bins contain “unseen” data with intermediate outcome surprise levels. Error bars are standard errors across subjects.

**Figure 4**
Spatiotemporal characterization of RPE valence and surprise signals. (a) Regions correlating with the EEG STV in our valence component, exhibiting overall greater response for positive compared to negative RPEs. (b) Regions correlating positively with outcome surprise as captured by a RL model (green) and the STV in our corresponding EEG component (yellow), respectively. Note the complementary nature of activations in the EEG STV map. All activations represent mixed-effects and are rendered on the standard MNI brain at Z > 2.57, cluster-corrected (P < 0.05) using a resampling procedure (minimum cluster size = 76 voxels).

**Figure 5**
Full spatial representation of the RPE valence and surprise networks and their overlap. (a) A conjunction analysis on the results arising from the EEG-informed regressors for the RPE valence (red clusters) and surprise (green clusters) components revealed that four areas – the STR, vmPFC, LG and MTG – significantly encoded both quantities (brown clusters). The conjunction analysis was performed using the resulting whole brain activation maps for RPE valence and surprise and applying a Z > 2.57, cluster-corrected (P < 0.05) using a resampling procedure. (b) The four overlapping regions exhibited a clear linear superposition profile between the two RPE components with a higher BOLD signal for positive vs. negative RPEs but also a systematic increase from low (L), to medium (M), to high (H) outcome surprise trials, within each outcome type.

See this image and copyright information in PMC

Cited by

Dorsal Anterior Cingulate Cortices Differentially Lateralize Prediction Errors and Outcome Valence in a Decision-Making Task.
Weiss AR, Gillies MJ, Philiastides MG, Apps MA, Whittington MA, FitzGerald JJ, Boccard SG, Aziz TZ, Green AL. Weiss AR, et al. Front Hum Neurosci. 2018 May 22;12:203. doi: 10.3389/fnhum.2018.00203. eCollection 2018. Front Hum Neurosci. 2018. PMID: 29872384 Free PMC article.
Influence of vmPFC on dmPFC Predicts Valence-Guided Belief Formation.
Kuzmanovic B, Rigoux L, Tittgemeyer M. Kuzmanovic B, et al. J Neurosci. 2018 Sep 12;38(37):7996-8010. doi: 10.1523/JNEUROSCI.0266-18.2018. Epub 2018 Aug 13. J Neurosci. 2018. PMID: 30104337 Free PMC article.
The Anterior Insula Processes a Time-Resolved Subjective Risk Prediction Error.
Kim JC, Hellrung L, Nebe S, Tobler PN. Kim JC, et al. J Neurosci. 2025 Jun 4;45(23):e2302242025. doi: 10.1523/JNEUROSCI.2302-24.2025. J Neurosci. 2025. PMID: 40268482 Free PMC article.
Global reward state affects learning and activity in raphe nucleus and anterior insula in monkeys.
Wittmann MK, Fouragnan E, Folloni D, Klein-Flügge MC, Chau BKH, Khamassi M, Rushworth MFS. Wittmann MK, et al. Nat Commun. 2020 Jul 28;11(1):3771. doi: 10.1038/s41467-020-17343-w. Nat Commun. 2020. PMID: 32724052 Free PMC article.
How the Level of Reward Awareness Changes the Computational and Electrophysiological Signatures of Reinforcement Learning.
Correa CMC, Noorman S, Jiang J, Palminteri S, Cohen MX, Lebreton M, van Gaal S. Correa CMC, et al. J Neurosci. 2018 Nov 28;38(48):10338-10348. doi: 10.1523/JNEUROSCI.0457-18.2018. Epub 2018 Oct 16. J Neurosci. 2018. PMID: 30327418 Free PMC article.

See all "Cited by" articles

References

1. Sutton, R. Reinforcement Learning: An Introduction. (MIT Press, 1998).
1. Schultz W, Dayan P, Montague PR. A Neural Substrate of Prediction and Reward. Science. 1997;275:1593–1599. doi: 10.1126/science.275.5306.1593. - DOI - PubMed
1. Fiorillo CD, Tobler PN, Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science. 2003;299:1898–1902. doi: 10.1126/science.1077349. - DOI - PubMed
1. Chau BKH, et al. Contrasting Roles for Orbitofrontal Cortex and Amygdala in Credit Assignment and Learning in Macaques. Neuron. 2015;87:1106–1118. doi: 10.1016/j.neuron.2015.08.018. - DOI - PMC - PubMed
1. Niv Y, et al. Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms. J. Neurosci. 2015;35:8145–8157. doi: 10.1523/JNEUROSCI.2978-14.2015. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

Affiliations

Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources