Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 5;37(14):3789-3798.
doi: 10.1523/JNEUROSCI.2943-16.2017. Epub 2017 Mar 7.

Whole-Brain Neural Dynamics of Probabilistic Reward Prediction

Affiliations

Whole-Brain Neural Dynamics of Probabilistic Reward Prediction

Dominik R Bach et al. J Neurosci. .

Abstract

Predicting future reward is paramount to performing an optimal action. Although a number of brain areas are known to encode such predictions, a detailed account of how the associated representations evolve over time is lacking. Here, we address this question using human magnetoencephalography (MEG) and multivariate analyses of instantaneous activity in reconstructed sources. We overtrained participants on a simple instrumental reward learning task where geometric cues predicted a distribution of possible rewards, from which a sample was revealed 2000 ms later. We show that predicted mean reward (i.e., expected value), and predicted reward variability (i.e., economic risk), are encoded distinctly. Early on, representations of mean reward are seen in parietal and visual areas, and later in frontal regions with orbitofrontal cortex emerging last. Strikingly, an encoding of reward variability emerges simultaneously in parietal/sensory and frontal sources and later than mean reward encoding. An orbitofrontal variability encoding emerged around the same time as that seen for mean reward. Crucially, cross-prediction showed that mean reward and variability representations are distinct and also revealed that instantaneous representations become more stable over time. Across sources, the best fitting metric for variability signals was coefficient of variation (rather than SD or variance), but distinct best metrics were seen for individual brain regions. Our data demonstrate how a dynamic encoding of probabilistic reward prediction unfolds in the brain both in time and space.SIGNIFICANCE STATEMENT Predicting future reward is paramount to optimal behavior. To gain insight into the underlying neural computations, we investigate how reward representations in the brain arise over time. Using magnetoencephalography, we show that a representation of predicted mean reward emerges early in parietal/sensory regions and later in frontal cortex. In contrast, predicted reward variability representations appear in most regions at the same time, and slightly later than for mean reward. For both features, representations dynamically change >1000 ms before stabilizing. The best metric for encoding variability is coefficient of variation, with heterogeneity in this encoding seen between brain areas. The results provide novel insights into the emergence of predictive reward representations.

Keywords: decoding; dynamic encoding; encoding; magnetoencephalography; reward prediction.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Experimental procedure. A, Visual cues with the three possible outcomes (i.e., there were no losses). Fill of circles represents mean reward. Color represents variability. Cue-outcome and response-outcome mapping was fully balanced across participants. B, Intratrial procedure. A reward predictor was shown for 2000 ms, during which participants indicated color and fill with one button press per feature. At offset, one of four possible messages appeared. C, Procedure. Participants were overtrained beforehand. A total of 90% of outcomes were hidden during MEG, to suppress a possible impact of ongoing learning processes.
Figure 2.
Figure 2.
Analysis scheme. Sensor data and reconstructed source data were averaged within 10 ms time bins and concatenated across participants. The data matrix X, together with the trial-by-trial stimulus variable of interest Y (mean reward, variability, or perceptual feature), was then fed into multivariate analysis.
Figure 3.
Figure 3.
Predicted reward representation in sensor and source signal patterns across participants at different time points. Encoding (black) and decoding (gray) of predicted mean reward and variability from MEG sensor signals and from reconstructed source activity. F values refer to F-transformed Pillai–Bartlett trace from the encoding MANOVA, or to F ratio of a decoding ANOVA, and are computed using pooled error variance, as reflected in degrees of freedom. Red (encoding) and pink (decoding) lines indicate significant time bins (p < 0.05) after a cluster-level based permutation test to account for multiple comparison across time.
Figure 4.
Figure 4.
Representation of mean reward. Results are summarized over 500 ms intervals. Individual time bins are shown in Movie 1 and Figure 6. Description of the source labels in Table 1. Red represents direct encoding (i.e., individual source significant in encoding model in at least one overall significant time bin in this interval, and also significant in decoding model in at least one time bin). Orange represents likely direct encoding (significant in either encoding or decoding, and undecided in the other). Blue represents indirect encoding (no decoding, but significant in encoding model). Green represents nonencoding, but correlated with encoding regions (no encoding, but significant decoding).
Figure 5.
Figure 5.
Representation of reward variability. Results are summarized over 500 ms intervals. Individual time bins are shown in Movie 2 and Figure 6. Description of the source labels in Table 1. Red represents direct encoding (i.e., individual source significant in encoding model in in at least one overall significant time bin in this interval, and also significant in decoding model in at least one time bin). Orange represents likely direct encoding (significant in either encoding or decoding, and undecided in the other). Blue represents indirect encoding (no decoding, but significant in encoding model). Green represents nonencoding, but correlated with encoding regions (no encoding, but significant decoding).
Figure 6.
Figure 6.
Representation of reward mean and variability. For each source, representation in each significant time bin is color-coded. Saturation reflects explained variance. Red represents direct encoding (i.e., individual source significant in encoding model and decoding model). Orange represents likely direct encoding (significant in either encoding or decoding, and undecided in the other). Blue represents indirect encoding (no decoding, but significant in encoding model). Green represents nonencoding, but correlated with encoding regions (no encoding, but significant decoding). Description of the source labels in Table 1.
Figure 7.
Figure 7.
Temporal stability of reward representation. Cross-prediction matrix for the prediction of data points from encoding MANOVAs fitted at other data points. Nonsignificant cross-prediction (random permutation test), and negative explained variance, is set to zero (dark blue). Temporal stability is higher in the second than in the first half of the anticipation window (random permutation of cross-prediction matrix, p < 0.001), although encoding is equal (mean) or stronger in the first half (variability).

References

    1. Aron AR, Shohamy D, Clark J, Myers C, Gluck MA, Poldrack RA (2004) Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. J Neurophysiol 92:1144–1152. 10.1152/jn.01209.2003 - DOI - PubMed
    1. Bach DR, Dolan RJ (2012) Knowing how much you don't know: a neural organization of uncertainty estimates. Nat Rev Neurosci 13:572–586. 10.1038/nrn3289 - DOI - PubMed
    1. Bach DR, Seymour B, Dolan RJ (2009) Neural activity associated with the passive prediction of ambiguity and risk for aversive events. J Neurosci 29:1648–1656. 10.1523/JNEUROSCI.4578-08.2009 - DOI - PMC - PubMed
    1. Bach DR, Hulme O, Penny WD, Dolan RJ (2011) The known unknowns: neural representation of second-order uncertainty, and ambiguity. J Neurosci 31:4811–4820. 10.1523/JNEUROSCI.1452-10.2011 - DOI - PMC - PubMed
    1. Bach DR, Furl N, Barnes G, Dolan RJ (2015) Sustained magnetic responses in temporal cortex reflect instantaneous significance of approaching and receding sounds. PLoS One 10:e0134060. 10.1371/journal.pone.0134060 - DOI - PMC - PubMed

Publication types

LinkOut - more resources