Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex

Cory Shain¹, Idan A Blank², Evelina Fedorenko³, Edward Gibson³, William Schuler⁴

Affiliations

¹ Massachusetts Institute of Technology, Cambridge, Massachusetts 02478 cshain@mit.edu.
² University of California, Los Angeles, Los Angeles, California 90095.
³ Massachusetts Institute of Technology, Cambridge, Massachusetts 02478.
⁴ The Ohio State University, Columbus, Ohio 43210.

PMID: 36002263
PMCID: PMC9525168
DOI: 10.1523/JNEUROSCI.1894-21.2022

Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex

Cory Shain et al. J Neurosci. 2022.

. 2022 Sep 28;42(39):7412-7430.

doi: 10.1523/JNEUROSCI.1894-21.2022.

Authors

Cory Shain¹, Idan A Blank², Evelina Fedorenko³, Edward Gibson³, William Schuler⁴

Affiliations

¹ Massachusetts Institute of Technology, Cambridge, Massachusetts 02478 cshain@mit.edu.
² University of California, Los Angeles, Los Angeles, California 90095.
³ Massachusetts Institute of Technology, Cambridge, Massachusetts 02478.
⁴ The Ohio State University, Columbus, Ohio 43210.

PMID: 36002263
PMCID: PMC9525168
DOI: 10.1523/JNEUROSCI.1894-21.2022

Abstract

To understand language, we must infer structured meanings from real-time auditory or visual signals. Researchers have long focused on word-by-word structure building in working memory as a mechanism that might enable this feat. However, some have argued that language processing does not typically involve rich word-by-word structure building, and/or that apparent working memory effects are underlyingly driven by surprisal (how predictable a word is in context). Consistent with this alternative, some recent behavioral studies of naturalistic language processing that control for surprisal have not shown clear working memory effects. In this fMRI study, we investigate a range of theory-driven predictors of word-by-word working memory demand during naturalistic language comprehension in humans of both sexes under rigorous surprisal controls. In addition, we address a related debate about whether the working memory mechanisms involved in language comprehension are language specialized or domain general. To do so, in each participant, we functionally localize (1) the language-selective network and (2) the "multiple-demand" network, which supports working memory across domains. Results show robust surprisal-independent effects of memory demand in the language network and no effect of memory demand in the multiple-demand network. Our findings thus support the view that language comprehension involves computationally demanding word-by-word structure building operations in working memory, in addition to any prediction-related mechanisms. Further, these memory operations appear to be primarily conducted by the same neural resources that store linguistic knowledge, with no evidence of involvement of brain regions known to support working memory across domains.SIGNIFICANCE STATEMENT This study uses fMRI to investigate signatures of working memory (WM) demand during naturalistic story listening, using a broad range of theoretically motivated estimates of WM demand. Results support a strong effect of WM demand in the brain that is distinct from effects of word predictability. Further, these WM demands register primarily in language-selective regions, rather than in "multiple-demand" regions that have previously been associated with WM in nonlinguistic domains. Our findings support a core role for WM in incremental language processing, using WM resources that are specialized for language.

Keywords: domain specificity; fMRI; naturalistic; sentence processing; surprisal; working memory.

PubMed Disclaimer

Figures

**Figure 1.**
Pairwise Pearson correlations between all word-level predictors considered in our exploratory analyses.

**Figure 2.**
Visualization of storage and integration and their associated costs in two of the three frameworks investigated here: the DLT (Gibson, 2000) versus left corner parsing theory (e.g., Rasmussen and Schuler, 2018). [The third framework—ACT-R (Lewis and Vasishth, 2005)—assumes a left corner parsing algorithm as in the figure above but differs in predicted processing costs, positing (1) no storage costs and (2) integration costs continuously weighted both by the recency of activation for the retrieval target and the degree of retrieval interference.] Costs are shown in boxes at each step. DLT walk-through: in the DLT, expected incomplete dependencies (open circles) are kept in WM and incur storage costs (SCs), whereas dependency construction (closed circles) requires retrieval from WM of the previously encountered item and incurs integration costs (ICs). DRs (effectively, nouns and verbs) that contribute to integration costs are underlined in the figure. At “The,” the processor hypothesizes and stores both an upcoming main verb for the sentence (V) and an upcoming noun complement (N). At “reporter,” the expected noun is encountered, contributing 1 DR and a dependency from “reporter” to “the,” which frees up memory. At “who,” the processor posits both a relative clause verb and a gap site, which is coreferent with “who,” and an additional noun complement is posited at “the.” The expected noun is observed at “senator,” contributing 1 DR and a dependency from “senator” to “the.” The awaited verb is observed at “attacked,” contributing 1 DR and two dependencies, one from “attacked” to “senator” and one from the implicit object gap to “who.” The latter spans 1 DR, increasing IC by 1. When “disliked” is encountered, an expected direct object is added to storage, and a subject dependency to “reporter” is constructed with an IC of 3 (the DR “disliked,” plus 2 intervening DRs). At the awaited object “editor,” the store is cleared and two dependencies are constructed (to “the” and “disliked”). Left corner walk-through: the memory store contains one or more incomplete derivation fragments (shown as polygons), each with an active sign (top) and an awaited sign (right) needed to complete the derivation. Storage cost is the number of derivation fragments currently in memory. Integration costs derive from binary lexical match (L) and grammatical match (G) decisions. Costs shown here index ends of multiword center embeddings (+L +G), where disjoint fragments are unified (though other cost definitions are possible, see below). At “the,” the processor posits a noun phrase (NP) awaiting a noun. There is nothing on the store, so both match decisions are negative. At “reporter,” the noun is encountered (+L) but the sentence is not complete (–G), and the active and awaited signs are updated to NP and relative clause (RC), respectively. At “who,” the processor updates its awaited category to S/NP [sentence (S) with gapped/relativized NP]. When “the” is encountered, it is analyzed neither as S/NP nor as a left child of an S/NP; thus, both match decisions are negative and a new derivation fragment is created in memory with active sign NP and awaited sign N. Lexical and grammatical matches occur at “senator,” unifying the two fragments in memory, and the awaited sign is updated to VP/NP [verb phrase (VP) with gapped NP, the missing unit of the RC]. The awaited VP (with gapped NP) is found at “attacked,” leading to a lexical match, and the awaited sign is updated to the missing VP of the main clause. The next two words (“disliked” and “the”) can be incorporated into the existing fragment, updating the awaited sign each time, and “editor” satisfies the awaited N, terminating the parse. Comparison: both approaches posit storage and integration (retrieval) mechanisms, but they differ in the details. For example, the DLT (but not left corner theory) posits a spike in integration cost at “attacked.” Differences in predictions between the two frameworks fall out from different claims about the role of WM in parsing.

**Figure 3.**
A, C, The critical working memory result (A), with reference estimates for surprisal variables and other controls shown in C. The LANG network shows a large positive estimate for integration cost (DLT-VCM, comparable to or larger than the surprisal effect) and a weak positive estimate for storage (DLT-S). The MD network estimates for both variables are weakly negative. fROIs individually replicate the critical DLT pattern and are plotted as points left-to-right in the following order (individual subplots by fROI are available on OSF: https://osf.io/ah429/): LANG: LIFGorb, LIFG, LMFG, LAntTemp, LPostTemp, LAngG; MD: LMFGorb, LMFG, LSFG, LIFGop, LPrecG, LmPFC, LInsula, LAntPar, LMidPar, LPostPar, RMFGorb, RMFG, RSFG, RIFGop, RPrecG, RmPFC, RInsula, RAntPar, RMidPar, and RpostPar (where L is left, R is right). (Note that estimates for the surprisal controls differ from those reported in the study by Shain et al. (2020). This is because models contain additional controls, especially adaptive surprisal, which overlaps with both of the other surprisal estimates and competes with them for variance. Surprisal effects are not tested because they are not relevant to our core claim.) Error bars show 95% Monte Carlo estimated variational Bayesian credible intervals. For reference, the group masks bounding the extent of the LANG and MD fROIs are shown projected onto the cortical surface. As explained in Materials and Methods, a small subset (10%) of voxels within each of these masks is selected in each participant based on the relevant localizer contrast. B, Schematic of analysis pipeline. fMRI data from the Study by Shain et al. (2020) are partitioned into training and generalization sets. The training set is used to train multiple CDR models, two for each of the memory variables explored in this study (a full model that contains the variable as a fixed effect and an ablated model that lacks it). Variables whose full model (1) contains estimates that go in the predicted direction and (2) significantly outperforms the ablated model on the training set are selected for the critical evaluation, which deploys the pretrained models to predict unseen responses in the generalization set and statistically evaluates the contribution of the selected variable to generalization performance.

See this image and copyright information in PMC

References

1. Amici S, Brambati SM, Wilkins DP, Ogar J, Dronkers NL, Miller BL, Gorno-Tempini ML (2007) Anatomical correlates of sentence comprehension and verbal working memory in neurodegenerative disease. J Neurosci 27:6282–6290. 10.1523/JNEUROSCI.1331-07.2007 - DOI - PMC - PubMed
1. Armeni K, Willems RM, den Bosch A, Schoffelen J-M (2019) Frequency-specific brain dynamics related to prediction during language comprehension. Neuroimage 198:283–295. - PubMed
1. Assem M, Glasser MF, Van Essen DC, Duncan J (2020a) A domain-general cognitive core defined in multimodally parcellated human cortex. Cereb Cortex 30:4361–4380. - PMC - PubMed
1. Assem M, Blank IA, Mineroff Z, Ademoğlu A, Fedorenko E (2020b) Activity in the fronto-parietal multiple-demand network is robustly associated with individual differences in working memory and fluid intelligence. Cortex 131:1–16. 10.1016/j.cortex.2020.06.013 - DOI - PMC - PubMed
1. Aurnhammer C, Frank SL (2019) Evaluating information-theoretic measures of word prediction in naturalistic sentence reading. Neuropsychologia 134:107198. - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex

Affiliations

Robust Effects of Working Memory Demand during Naturalistic Language Comprehension in Language-Selective Cortex

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources