Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 1;5(1):167-200.
doi: 10.1162/nol_a_00121. eCollection 2024.

Surprisal From Language Models Can Predict ERPs in Processing Predicate-Argument Structures Only if Enriched by an Agent Preference Principle

Affiliations

Surprisal From Language Models Can Predict ERPs in Processing Predicate-Argument Structures Only if Enriched by an Agent Preference Principle

Eva Huber et al. Neurobiol Lang (Camb). .

Abstract

Language models based on artificial neural networks increasingly capture key aspects of how humans process sentences. Most notably, model-based surprisals predict event-related potentials such as N400 amplitudes during parsing. Assuming that these models represent realistic estimates of human linguistic experience, their success in modeling language processing raises the possibility that the human processing system relies on no other principles than the general architecture of language models and on sufficient linguistic input. Here, we test this hypothesis on N400 effects observed during the processing of verb-final sentences in German, Basque, and Hindi. By stacking Bayesian generalised additive models, we show that, in each language, N400 amplitudes and topographies in the region of the verb are best predicted when model-based surprisals are complemented by an Agent Preference principle that transiently interprets initial role-ambiguous noun phrases as agents, leading to reanalysis when this interpretation fails. Our findings demonstrate the need for this principle independently of usage frequencies and structural differences between languages. The principle has an unequal force, however. Compared to surprisal, its effect is weakest in German, stronger in Hindi, and still stronger in Basque. This gradient is correlated with the extent to which grammars allow unmarked NPs to be patients, a structural feature that boosts reanalysis effects. We conclude that language models gain more neurobiological plausibility by incorporating an Agent Preference. Conversely, theories of human processing profit from incorporating surprisal estimates in addition to principles like the Agent Preference, which arguably have distinct evolutionary roots.

Keywords: ERP; artificial neural networks; computational modeling; event cognition; large language models (LLMs); sentence processing; surprisal.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

<b>Figure 1.</b>
Figure 1.
For each language and model, the table shows the posterior estimates of differences in surprisal values (ΔSurprisal) between ungrammatical and grammatical sentences with 50%, 80%, and 90% credible intervals, controlling for differences in sentences. ΔSurprisal is calculated by subtracting the mean surprisal of the grammatical sentence from the ungrammatical sentence. Values are then estimated with a Bayesian model that controls for the variance in the stimuli as a random effect and quantifies the estimates’ probabilities (see Supporting Information S1.4.4). LSTM = long short-term memory models.
<b>Figure 2.</b>
Figure 2.
Posterior distributions of the estimated surprisal difference (ΔSurprisal) between the experimental conditions that elicited the Predicate N400 (German: patient initial–agent initial; Hindi: ambiguous–unambiguous; Basque: patient–agent) across control conditions (Condition 2 in Table 7). Horizontal bars indicate 50%, 80%, and 90% highest-density credible intervals. In order to show a substantial difference between conditions, ΔSurprisal estimates are expected to exclude 0. The estimated ΔSurprisal on the sentence-level can be found in the Supporting Information S2 (Analysis 1: Predicting Surprisal).
<b>Figure 3.</b>
Figure 3.
Relative weights of models as determined by model stacking. Weights are allocated to models in such a way that they jointly maximise prediction accuracy. Each model is a Bayesian generalised additive model with the following predictors (in addition to random effects of sentence and participant and a main effect of trial number): surprisal alone, Agent Preference alone, surprisal and Agent Preference together, or neither of the two (null). Agent Preference is a binary variable, categorising sentences into those where the Agent Preference principle predicts role reanalysis at the position of the verb (because the initial ambiguous NP turns out to be a patient) vs. those where no reanalysis is predicted (because the NP is indeed an agent). Surprisal is a continuous variable derived from LSTM, BERT/RoBERTa, or GPT-2 models. For all languages, models with both Agent Preference and surprisal (estimated by BERT/RoBERTa models for Basque and Hindi, and an LSTM model for German) leverage most of the weight. NP = noun phrase.
<b>Figure 4.</b>
Figure 4.
Pair-wise grand mean differences of event related potentials in the N400 time window (300–500 ms relative to verb onset). (Left column) Topography plots of observed grand mean differences in amplitudes between sentences with vs. without reanalysis as predicted by the Agent Preference principle. (Right column) Topography plots of observed grand mean differences in amplitudes for sentences with high vs. low surprisal verbs (>0 and <0), as estimated by the highest-weighted model (cf. Figure 3).
<b>Figure 5.</b>
Figure 5.
Pair-wise fitted differences of event related potentials in the N400 time window (300–500 ms relative to verb onset), drawn from the highest-weighted model (cf. Figure 3). Upper panels in each language (A–B, E–F, I–J) quantify effect size and electrode regions through the posterior mean differences of smooth surfaces at each scalp coordinate for sentences with vs. sentences without the predicted reanalysis (A, E, and I) and for sentences with high vs. low surprisal verbs (+2 vs. −2 SD from the mean; B, F, and J). Mean differences with posterior probability <0.8 are plotted grey. Lower panels (C–D, G–H, K–L) quantify the evidence through the proportion of posterior differences that are below or above 0 at each coordinate. Proportions <0.8 are left white. In German, the Agent Preference principle has a considerably smaller (A) and less supported (C) effect than surprisal (B and D), a difference that is less strong in Hindi and even weaker in Basque.

References

    1. Agerri, R., San Vicente, I., Campos, J. A., Barrena, A., Saralegi, X., Soroa, A., & Agirre, E. (2020). Give your text representation models some love: The case for Basque. In Proceedings of the twelfth language resources and evaluation conference (pp. 4781–4788). European Language Resources Association.
    1. Arehalli, S., Dillon, B., & Linzen, T. (2022). Syntactic surprisal from neural models predicts, but underestimates, human processing difficulty from syntactic ambiguities. In Proceedings of the 26th conference on computational natural language learning (CoNLL). Association for Computational Linguistics. 10.18653/v1/2022.conll-1.20 - DOI
    1. Arehalli, S., & Linzen, T. (2020). Neural language models capture some, but not all, agreement attraction effects. PsyArXiv. 10.31234/osf.io/97qcg - DOI
    1. Armeni, K., Willems, R. M., & Frank, S. L. (2017). Probabilistic language models in cognitive neuroscience: Promises and pitfalls. Neuroscience & Biobehavioral Reviews, 83, 579–588. 10.1016/j.neubiorev.2017.09.001, - DOI - PubMed
    1. Aurnhammer, C., & Frank, S. L. (2019). Comparing gated and simple recurrent neural network architectures as models of human sentence processing. PsyArXiv. 10.31234/osf.io/wec74 - DOI

LinkOut - more resources