The influence of internal models on feedback-related brain activity

Franz Wurm¹, Benjamin Ernst², Marco Steinhauser²

Affiliations

¹ Catholic University of Eichstätt-Ingolstadt, Ostenstraße 27, 85072, Eichstätt, Germany. franz.wurm@ku.de.
² Catholic University of Eichstätt-Ingolstadt, Ostenstraße 27, 85072, Eichstätt, Germany.

PMID: 32812148
PMCID: PMC7497542
DOI: 10.3758/s13415-020-00820-6

The influence of internal models on feedback-related brain activity

Franz Wurm et al. Cogn Affect Behav Neurosci. 2020 Oct.

. 2020 Oct;20(5):1070-1089.

doi: 10.3758/s13415-020-00820-6.

Authors

Franz Wurm¹, Benjamin Ernst², Marco Steinhauser²

Affiliations

¹ Catholic University of Eichstätt-Ingolstadt, Ostenstraße 27, 85072, Eichstätt, Germany. franz.wurm@ku.de.
² Catholic University of Eichstätt-Ingolstadt, Ostenstraße 27, 85072, Eichstätt, Germany.

PMID: 32812148
PMCID: PMC7497542
DOI: 10.3758/s13415-020-00820-6

Abstract

Decision making relies on the interplay between two distinct learning mechanisms, namely habitual model-free learning and goal-directed model-based learning. Recent literature suggests that this interplay is significantly shaped by the environmental structure as represented by an internal model. We employed a modified two-stage but one-decision Markov decision task to investigate how two internal models differing in the predictability of stage transitions influence the neural correlates of feedback processing. Our results demonstrate that fronto-central theta and the feedback-related negativity (FRN), two correlates of reward prediction errors in the medial frontal cortex, are independent of the internal representations of the environmental structure. In contrast, centro-parietal delta and the P3, two correlates possibly reflecting feedback evaluation in working memory, were highly susceptible to the underlying internal model. Model-based analyses of single-trial activity showed a comparable pattern, indicating that while the computation of unsigned reward prediction errors is represented by theta and the FRN irrespective of the internal models, the P3 adapts to the internal representation of an environment. Our findings further substantiate the assumption that the feedback-locked components under investigation reflect distinct mechanisms of feedback processing and that different internal models selectively influence these mechanisms.

Keywords: Event-related potentials; Feedback processing; Model-based learning; Model-free learning; Reinforcement learning; Time-frequency analysis.

PubMed Disclaimer

Figures

**Fig. 1**
a. Schematic representation of the environmental contingencies for the predictable and random conditions. The conditions differed regarding their transition structure but had an identical reward structure. b. Graphical illustration of a trial: After fixation cross presentation, participants had to decide between two pictures at Stage 1 and were subsequently forwarded to Stage 2. Depending on the second-stage stimulus, feedback was presented. c. Stay probabilities, averaged across subjects. Error bars depict ±SEM. Gray circles indicate stay probabilities for the individual subjects. d. Subjects’ performance in predictable conditions, plotted as the mean proportion of correct decisions across subblocks. Correct decisions are defined as first-stage choices for the stimulus which commonly led to the high reward second-stage picture. Subblocks were assigned post-hoc by separating the 50 trials of the predictable condition in each block into ten equal parts, consisting of 5 trials. Dashed lines depict ±SEM

**Fig. 2**
Feedback-locked time-domain activity at electrode FCz. a, b: Grand average waveform for the predictable and the random conditions. Shaded areas show the 95% confidence intervals. **c, d:** Peak-to-peak amplitudes. Gray circles indicate the amplitudes for the individual subjects

**Fig. 3**
Feedback-locked theta frequency neural activity at electrode FCz. a, b: Estimated power for the predictable and the random conditions. The black rectangle specifies the time window (200-400 ms) and frequency window (4-8 Hz, theta) of interest. c: Logarithmic frequency scaling of the difference between losses and wins for each condition and expectedness. d: Mean power values in the 200-400 ms time window and 4-8 Hz frequency window. Gray circles indicate the power values for the individual subjects

**Fig. 4**
Feedback-locked time-domain neural activity at electrode Pz. a, b: Grand average waveform for the predictable and the random conditions. Shaded areas show the 95% confidence intervals. **c, d:** Difference waves of the expectancy effect for the predictable and the random conditions, calculated as unexpected minus expected. Shaded areas show the 95% confidence intervals. **e, f:** Mean amplitudes in the 300-500 ms time window. Gray circles indicate the mean amplitudes for the individual subjects. **g, h:** Topographies of the difference wave between unexpected and expected for each condition and valence 300-500 ms after feedback onset

**Fig. 5**
Feedback-locked delta frequency neural activity at electrode Pz. a, b: Estimated power for the predictable and the random conditions. The black rectangle specifies the time window (300-500 ms) and frequency window (1-4 Hz, delta) of interest. c: Logarithmic frequency scaling of the difference between losses and wins for each condition and expectedness. d: Mean power values in the 300-500 ms time window and 1-4 Hz frequency window. Gray circles indicate the power values for the individual subjects

**Fig. 6**
Mean standardized regression weights for the relationship between absolute reward prediction error estimates and single-trial neural activity. Gray circles indicate regression weights for the individual subjects. a: FRN activity was estimated via peak-to-peak measures at electrode FCz. b: P3 activity was estimated via averaging at electrode Pz in the 300-500 ms time window. c: Theta activity was estimated via averaging at electrode FCz in the 200-400 ms time window and 4-8 Hz frequency window. d: Delta activity was estimated via averaging at electrode Pz in the 300-500 ms time window and 1-4 Hz frequency window. e: Representative data from participant 16 for single-trial regression between reward prediction error estimates and P3 amplitudes. Note that for the results reported we used absolute reward prediction errors

See this image and copyright information in PMC

Cited by

The potential application of event-related potentials to enhance research on reward processes in eating disorders.
Forester G, Schaefer LM, Dodd DR, Johnson JS. Forester G, et al. Int J Eat Disord. 2022 Nov;55(11):1484-1495. doi: 10.1002/eat.23821. Epub 2022 Oct 10. Int J Eat Disord. 2022. PMID: 36214253 Free PMC article. Review.
What is left after an error? Towards a comprehensive account of goal-based binding and retrieval.
Foerster A, Moeller B, Frings C, Pfister R. Foerster A, et al. Atten Percept Psychophys. 2023 Jan;85(1):120-139. doi: 10.3758/s13414-022-02609-w. Epub 2022 Nov 30. Atten Percept Psychophys. 2023. PMID: 36451075 Free PMC article.
The impact of emotional feedback in learning easy and difficult tasks - an ERP study.
Braunwarth JI, Ferdinand NK. Braunwarth JI, et al. Cogn Affect Behav Neurosci. 2025 Aug;25(4):971-988. doi: 10.3758/s13415-025-01284-2. Epub 2025 Mar 27. Cogn Affect Behav Neurosci. 2025. PMID: 40148734 Free PMC article.
The Hippocampus in Pigeons Contributes to the Model-Based Valuation and the Relationship between Temporal Context States.
Yang L, Jin F, Yang L, Li J, Li Z, Li M, Shang Z. Yang L, et al. Animals (Basel). 2024 Jan 29;14(3):431. doi: 10.3390/ani14030431. Animals (Basel). 2024. PMID: 38338074 Free PMC article.
Global neural encoding of behavioral strategies in mice during perceptual decision-making task with two different sensory patterns.
Wang S, Gao H, Ueoka Y, Ishizu K, Funamizu A. Wang S, et al. iScience. 2024 Oct 16;27(11):111182. doi: 10.1016/j.isci.2024.111182. eCollection 2024 Nov 15. iScience. 2024. PMID: 39524342 Free PMC article.

See all "Cited by" articles

References

1. Alexander WH, Brown JW. Medial prefrontal cortex as an action-outcome predictor. Nature Neuroscience. 2011;14(10):1338–1344. doi: 10.1038/nn.2921. - DOI - PMC - PubMed
1. Balleine BW, O’Doherty JP. Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action. Neuropsychopharmacology. 2010;35(1):48–69. doi: 10.1038/npp.2009.131. - DOI - PMC - PubMed
1. Bell AJ, Sejnowski TJ. Information-maximization approach to blind separation and blind deconvolution. Neural Computation. 1995;7(6):1129–1159. doi: 10.1162/neco.1995.7.6.1129. - DOI - PubMed
1. Bellebaum C, Daum I. Learning-related changes in reward expectancy are reflected in the feedback-related negativity. European Journal of Neuroscience. 2008;27(7):1823–1835. doi: 10.1111/j.1460-9568.2008.06138.x. - DOI - PubMed
1. Bellman R. Functional Equations in the Theory of Dynamic Programming--VII. A Partial Differential Equation for the Fredholm Resolvent. Proceedings of the American Mathematical Society. 1957;8(3):435. doi: 10.2307/2033490. - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The influence of internal models on feedback-related brain activity

Affiliations

The influence of internal models on feedback-related brain activity

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources