Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 11:5:180291.
doi: 10.1038/sdata.2018.291.

ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

Affiliations

ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

Nora Hollenstein et al. Sci Data. .

Abstract

We present the Zurich Cognitive Language Processing Corpus (ZuCo), a dataset combining electroencephalography (EEG) and eye-tracking recordings from subjects reading natural sentences. ZuCo includes high-density EEG and eye-tracking data of 12 healthy adult native English speakers, each reading natural English text for 4-6 hours. The recordings span two normal reading tasks and one task-specific reading task, resulting in a dataset that encompasses EEG and eye-tracking data of 21,629 words in 1107 sentences and 154,173 fixations. We believe that this dataset represents a valuable resource for natural language processing (NLP). The EEG and eye-tracking signals lend themselves to train improved machine-learning models for various tasks, in particular for information extraction tasks such as entity and relation extraction and sentiment analysis. Moreover, this dataset is useful for advancing research into the human reading and language understanding process at the level of brain activity and eye-movement.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1. Histogram of the reading speeds of all sentences for all three tasks.
Figure 2
Figure 2. Sample screens for a sentence of each task.
(a) Task 1 (Sentiment). (b) Task 2 (Normal Reading). (c) Task 3 (Task-specific Reading).
Figure 3
Figure 3. Visualization of single trial EEG and eye-tracking data.
(a) Prototypical single sentence fixation data for a representative subject. Red crosses indicate fixations. Boxes around the words indicate the area in which fixations are allocated to the specific word. (b) Raw gaze data of the fixation data plotted above. (c) Subset of the raw EEG data during the sentence. Electrodes matching the 10–20 systems were chosen and for plotting purposes data were bandpass-filtered (0.5–30 Hz.). (d) Same data as in (c) after preprocessing.
Figure 4
Figure 4. Omission rates and skipping proportions (means and standard errors) for all tasks and subjects (means and standard error).
(a) The omission rates for each task and for each subject, where the y-axis shows the proportion of words being skipped in a sentence (0–1). (b) The skipping proportion (y-axis) for each task and for each subject.
Figure 5
Figure 5. Effect of word length on the skipping proportion per task (mean and standard deviation), x-axis = word length, y-axis = mean skipping proportion.
Figure 6
Figure 6. Violin plot showing means, distributions, and ranges of the reading time measures per word for each task and each eye-tracking feature (x-axis) in milliseconds.
Figure 7
Figure 7. FRPs during the different task conditions with selected scalp level potential distributions.
Topographies show amplitudes in microvolt, coded as color.
Figure 8
Figure 8. Clustered EEG segments.
(a) FRPs of electrode Cz, clustered by duration of the fixation. (b) Each horizontal line represents the mean of the current and 50 adjacent EEG epochs, segmented on fixation onset. Segments are ordered by fixation duration (top: shortest fixation, bottom: longest fixation). Color represents the amplitude of the signal in microvolt.

References

Data Citations

    1. Hollenstein N., et al. . 2018. Open Science Framework. https://doi.org/10.17605/OSF.IO/Q3ZWS - DOI

References

    1. Barrett M., Bingel J., Keller F. & Søgaard A. Weakly supervised part-of-speech tagging using eye-tracking data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 2, 579–584 (2016).
    1. Mishra A., Kanojia D., Nagar S., Dey K. & Bhattacharyya P. Leveraging cognitive features for sentiment analysis. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (2016).
    1. Søgaard A. Evaluating word embeddings with fMRI and eye-tracking. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP 116–121 (2016).
    1. Kennedy A. The Dundee Corpus. (University of Dundee, 2003).
    1. Cop U., Dirix N., Drieghe D. & Duyck W. Presenting GECO: An eye-tracking corpus of monolingual and bilingual sentence reading. Behavior Research Methods 49, 602–615 (2016). - PubMed