Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 4;10(1):862.
doi: 10.1038/s41597-023-02752-5.

Introducing MEG-MASC a high-quality magneto-encephalography dataset for evaluating natural speech processing

Affiliations

Introducing MEG-MASC a high-quality magneto-encephalography dataset for evaluating natural speech processing

Laura Gwilliams et al. Sci Data. .

Abstract

The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers who listened to two hours of naturalistic stories. Each participant performed two identical sessions, involving listening to four fictional stories from the Manually Annotated Sub-Corpus (MASC) intermixed with random word lists and comprehension questions. We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS). This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech. We provide the Python code to replicate several validations analyses of the MEG evoked responses such as the temporal decoding of phonetic features and word frequency. All code and MEG, audio and text data are publicly available to keep with best practices in transparent and reproducible research.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Dataset file structure.
Fig. 2
Fig. 2
Median (across subjects) evoked response to all words. The gray area indicates the global field power (GFP).
Fig. 3
Fig. 3
(a) Average (mean) decoding of whether the phoneme is voiced or not as a function of time following phoneme onset. The four colors refer to the four tasks (stories + word lists). Error bar are SEM across subjects. (b) Same as A for the decoding of words’ zipf frequency as a function of word onset. (c) Decoding of voicing (average across all tasks) for each participant, as a function of time following phoneme onset. (d) Same as C for decoding of word frequency (average across all tasks) for each participant, as a function of time following word onset.
Fig. 4
Fig. 4
MEG data annotations: Pandas DataFrame of sound, phoneme and word time-stamps.

Similar articles

Cited by

References

    1. Hickok G, Poeppel D. The cortical organization of speech processing. Nat. reviews neuroscience. 2007;8:393–402. doi: 10.1038/nrn2113. - DOI - PubMed
    1. Berwick RC, Friederici AD, Chomsky N, Bolhuis JJ. Evolution, brain, and the nature of language. Trends cognitive sciences. 2013;17:89–98. doi: 10.1016/j.tics.2012.12.002. - DOI - PubMed
    1. Dehaene S, Meyniel F, Wacongne C, Wang L, Pallier C. The neural representation of sequences: from transition probabilities to algebraic patterns and linguistic trees. Neuron. 2015;88:2–19. doi: 10.1016/j.neuron.2015.09.019. - DOI - PubMed
    1. Hamilton LS, Huth AG. The revolution will not be controlled: natural stimuli in speech neuroscience. Lang. cognition neuroscience. 2020;35:573–582. doi: 10.1080/23273798.2018.1499946. - DOI - PMC - PubMed
    1. Gwilliams L, King J-R. Recurrent processes support a cascade of hierarchical decisions. ELife. 2020;9:e56603. doi: 10.7554/eLife.56603. - DOI - PMC - PubMed