Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;20(5):1070-1089.
doi: 10.3758/s13415-020-00820-6.

The influence of internal models on feedback-related brain activity

Affiliations

The influence of internal models on feedback-related brain activity

Franz Wurm et al. Cogn Affect Behav Neurosci. 2020 Oct.

Abstract

Decision making relies on the interplay between two distinct learning mechanisms, namely habitual model-free learning and goal-directed model-based learning. Recent literature suggests that this interplay is significantly shaped by the environmental structure as represented by an internal model. We employed a modified two-stage but one-decision Markov decision task to investigate how two internal models differing in the predictability of stage transitions influence the neural correlates of feedback processing. Our results demonstrate that fronto-central theta and the feedback-related negativity (FRN), two correlates of reward prediction errors in the medial frontal cortex, are independent of the internal representations of the environmental structure. In contrast, centro-parietal delta and the P3, two correlates possibly reflecting feedback evaluation in working memory, were highly susceptible to the underlying internal model. Model-based analyses of single-trial activity showed a comparable pattern, indicating that while the computation of unsigned reward prediction errors is represented by theta and the FRN irrespective of the internal models, the P3 adapts to the internal representation of an environment. Our findings further substantiate the assumption that the feedback-locked components under investigation reflect distinct mechanisms of feedback processing and that different internal models selectively influence these mechanisms.

Keywords: Event-related potentials; Feedback processing; Model-based learning; Model-free learning; Reinforcement learning; Time-frequency analysis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
a. Schematic representation of the environmental contingencies for the predictable and random conditions. The conditions differed regarding their transition structure but had an identical reward structure. b. Graphical illustration of a trial: After fixation cross presentation, participants had to decide between two pictures at Stage 1 and were subsequently forwarded to Stage 2. Depending on the second-stage stimulus, feedback was presented. c. Stay probabilities, averaged across subjects. Error bars depict ±SEM. Gray circles indicate stay probabilities for the individual subjects. d. Subjects’ performance in predictable conditions, plotted as the mean proportion of correct decisions across subblocks. Correct decisions are defined as first-stage choices for the stimulus which commonly led to the high reward second-stage picture. Subblocks were assigned post-hoc by separating the 50 trials of the predictable condition in each block into ten equal parts, consisting of 5 trials. Dashed lines depict ±SEM
Fig. 2
Fig. 2
Feedback-locked time-domain activity at electrode FCz. a, b: Grand average waveform for the predictable and the random conditions. Shaded areas show the 95% confidence intervals. c, d: Peak-to-peak amplitudes. Gray circles indicate the amplitudes for the individual subjects
Fig. 3
Fig. 3
Feedback-locked theta frequency neural activity at electrode FCz. a, b: Estimated power for the predictable and the random conditions. The black rectangle specifies the time window (200-400 ms) and frequency window (4-8 Hz, theta) of interest. c: Logarithmic frequency scaling of the difference between losses and wins for each condition and expectedness. d: Mean power values in the 200-400 ms time window and 4-8 Hz frequency window. Gray circles indicate the power values for the individual subjects
Fig. 4
Fig. 4
Feedback-locked time-domain neural activity at electrode Pz. a, b: Grand average waveform for the predictable and the random conditions. Shaded areas show the 95% confidence intervals. c, d: Difference waves of the expectancy effect for the predictable and the random conditions, calculated as unexpected minus expected. Shaded areas show the 95% confidence intervals. e, f: Mean amplitudes in the 300-500 ms time window. Gray circles indicate the mean amplitudes for the individual subjects. g, h: Topographies of the difference wave between unexpected and expected for each condition and valence 300-500 ms after feedback onset
Fig. 5
Fig. 5
Feedback-locked delta frequency neural activity at electrode Pz. a, b: Estimated power for the predictable and the random conditions. The black rectangle specifies the time window (300-500 ms) and frequency window (1-4 Hz, delta) of interest. c: Logarithmic frequency scaling of the difference between losses and wins for each condition and expectedness. d: Mean power values in the 300-500 ms time window and 1-4 Hz frequency window. Gray circles indicate the power values for the individual subjects
Fig. 6
Fig. 6
Mean standardized regression weights for the relationship between absolute reward prediction error estimates and single-trial neural activity. Gray circles indicate regression weights for the individual subjects. a: FRN activity was estimated via peak-to-peak measures at electrode FCz. b: P3 activity was estimated via averaging at electrode Pz in the 300-500 ms time window. c: Theta activity was estimated via averaging at electrode FCz in the 200-400 ms time window and 4-8 Hz frequency window. d: Delta activity was estimated via averaging at electrode Pz in the 300-500 ms time window and 1-4 Hz frequency window. e: Representative data from participant 16 for single-trial regression between reward prediction error estimates and P3 amplitudes. Note that for the results reported we used absolute reward prediction errors

Similar articles

Cited by

References

    1. Alexander WH, Brown JW. Medial prefrontal cortex as an action-outcome predictor. Nature Neuroscience. 2011;14(10):1338–1344. doi: 10.1038/nn.2921. - DOI - PMC - PubMed
    1. Balleine BW, O’Doherty JP. Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action. Neuropsychopharmacology. 2010;35(1):48–69. doi: 10.1038/npp.2009.131. - DOI - PMC - PubMed
    1. Bell AJ, Sejnowski TJ. Information-maximization approach to blind separation and blind deconvolution. Neural Computation. 1995;7(6):1129–1159. doi: 10.1162/neco.1995.7.6.1129. - DOI - PubMed
    1. Bellebaum C, Daum I. Learning-related changes in reward expectancy are reflected in the feedback-related negativity. European Journal of Neuroscience. 2008;27(7):1823–1835. doi: 10.1111/j.1460-9568.2008.06138.x. - DOI - PubMed
    1. Bellman R. Functional Equations in the Theory of Dynamic Programming--VII. A Partial Differential Equation for the Fredholm Resolvent. Proceedings of the American Mathematical Society. 1957;8(3):435. doi: 10.2307/2033490. - DOI

LinkOut - more resources