Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 29;9(1):530.
doi: 10.1038/s41597-022-01625-7.

Le Petit Prince multilingual naturalistic fMRI corpus

Affiliations

Le Petit Prince multilingual naturalistic fMRI corpus

Jixing Li et al. Sci Data. .

Abstract

Neuroimaging using more ecologically valid stimuli such as audiobooks has advanced our understanding of natural language comprehension in the brain. However, prior naturalistic stimuli have typically been restricted to a single language, which limited generalizability beyond small typological domains. Here we present the Le Petit Prince fMRI Corpus (LPPC-fMRI), a multilingual resource for research in the cognitive neuroscience of speech and language during naturalistic listening (OpenNeuro: ds003643). 49 English speakers, 35 Chinese speakers and 28 French speakers listened to the same audiobook The Little Prince in their native language while multi-echo functional magnetic resonance imaging was acquired. We also provide time-aligned speech annotation and word-by-word predictors obtained using natural language processing tools. The resulting timeseries data are shown to be of high quality with good temporal signal-to-noise ratio and high inter-subject correlation. Data-driven functional analyses provide further evidence of data quality. This annotated, multilingual fMRI dataset facilitates future re-analysis that addresses cross-linguistic commonalities and differences in the neural substrate of language processing on multiple perceptual and linguistic levels.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Schematic overview of the LPPC-fMRI data collection procedures, preprocessing, technical validation and annotation. During data collection (blue), anatomical MRI was first acquired, followed by functional MRI while participants listened to 9 sections of the audiobook. After preprocessing the data (green), behavioral and overall data quality were examined (yellow). Audio and text annotations were extracted using NLP tools.
Fig. 2
Fig. 2
Annotation information for the stimuli. (a) Word boundaries in the audio files, included in files: lpp<EN/CN/FR>_section[1–9].TextGrid. (b) f0 and RMS intensity for every 10 ms of the audios, included in files: lpp<EN/CN/FR>_prosody.csv (c) Tokenization, lemmatization, log-tranformed word frequency and POS tagging, included in files: lpp<EN/CN/FR>_word_information.csv. (d) GloVe and BERT embeddings for every word in the audiobooks, included in files: lpp<EN/CN/FR>_word_embeddings_GloVe.csv and lpp<EN/CN/FR>_word_embeddings_BERT.csv (e) Parsed syntactic trees based on constituency grammar with node counts using top-down, bottom-up, and left-corner parsing strategies, included in files: lpp<EN/CN/FR>_trees.csv. (f) Dependency relations for each words in each sentence, included in files: lpp < EN/CN/FR > _dependency.csv. (g) Named entity recognition and coreference relations for the English and Chinese texts, included in files: lpp<EN/CN>_coreference.csv.
Fig. 3
Fig. 3
Organization of the data collection. (a) General overview of directory structure. (b) Content of subject-specific anatomical and raw data directories. (c) Content of subject-specific preprocessed data directories. (d) Content of the stimuli directory. (e) Content of the quiz directory. (f) Content of the language-specific annotation directory.
Fig. 4
Fig. 4
Voxel-wise temporal signal-to-noise ratio analysis before and after preprocessing. Cohen’s d effect sizes showed increase in tSNR after preprocessing.
Fig. 5
Fig. 5
Results of inter-subject correlation (ISC) demonstrating data quality and timing synchrony between participants. As expected, the temporal regions showed the largest correlation in brain responses across subjects.
Fig. 6
Fig. 6
GLM analyses to localize the wordrate regressor. (a) Offest of each word in the audiobook was marked 1 and was convolved with the canonical hemodynamic response function. (b) The timecourse of each voxel’s BOLD signals was modeled using our designmatrix at the first level At the group level, a one-sample t-test was performed on the distribution of the beta values for the wordrate regressor across subjects at each voxel for the fMRI data. Statistical significance was held at p < 0.05 FWE with a cluster size greater than 50.
Fig. 7
Fig. 7
GLM results showing the significant clusters for (a) the pitch and (b) word regions in the English, Chinese and French data using f0 and wordrate annotations. Red areas in the second column of the 3D brains shows meta-analyses of pitch and word regions from Neurosynth. Statistical significance was thresholded at p < 0.05 FWE and k > 50.

References

    1. Alday PM. M/EEG analysis of naturalistic stories: A review from speech to language processing. Language. Cognition and Neuroscience. 2019;34:457–473. doi: 10.1080/23273798.2018.1546882. - DOI
    1. Brennan J. Naturalistic sentence comprehension in the brain. Language and Linguistics Compass. 2016;10:299–313. doi: 10.1111/lnc3.12198. - DOI
    1. Kandylaki KD, Bornkessel-Schlesewsky I. From story comprehension to the neurobiology of language. Language. Cognition and Neuroscience. 2019;34:405–410. doi: 10.1080/23273798.2019.1584679. - DOI
    1. Stehwien, S., Henke, L., Hale, J., Brennan, J. & Meyer, L. The Little Prince in 26 languages: Towards a multilingual neuro-cognitive corpus. In Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources, 43–49 (European Language Resources Association, 2020).
    1. Li J, Hale J, Pallier C, 2022. Le Petit Prince: A multilingual fMRI corpus using ecological stimuli. OpenNeuro. - DOI - PMC - PubMed