. 2018 Dec 11:5:180291.

doi: 10.1038/sdata.2018.291.

ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

Nora Hollenstein¹, Jonathan Rotsztejn¹, Marius Troendle², Andreas Pedroni^{2

3}, Ce Zhang¹, Nicolas Langer^{2

3

4}

Affiliations

¹ Department of Computer Science, ETH Zurich, Zurich, Switzerland.
² Methods of Plasticity Research, Department of Psychology, University of Zurich, Zurich, Switzerland.
³ University Research Priority Program (URPP) Dynamics of Healthy Aging, Zurich, Switzerland.
⁴ Neuroscience Center Zurich (ZNZ), Zurich, Switzerland.

PMID: 30531985
PMCID: PMC6289117
DOI: 10.1038/sdata.2018.291

ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

Nora Hollenstein et al. Sci Data. 2018.

. 2018 Dec 11:5:180291.

doi: 10.1038/sdata.2018.291.

Authors

Nora Hollenstein¹, Jonathan Rotsztejn¹, Marius Troendle², Andreas Pedroni^{2

3}, Ce Zhang¹, Nicolas Langer^{2

3

4}

Affiliations

¹ Department of Computer Science, ETH Zurich, Zurich, Switzerland.
² Methods of Plasticity Research, Department of Psychology, University of Zurich, Zurich, Switzerland.
³ University Research Priority Program (URPP) Dynamics of Healthy Aging, Zurich, Switzerland.
⁴ Neuroscience Center Zurich (ZNZ), Zurich, Switzerland.

PMID: 30531985
PMCID: PMC6289117
DOI: 10.1038/sdata.2018.291

Abstract

We present the Zurich Cognitive Language Processing Corpus (ZuCo), a dataset combining electroencephalography (EEG) and eye-tracking recordings from subjects reading natural sentences. ZuCo includes high-density EEG and eye-tracking data of 12 healthy adult native English speakers, each reading natural English text for 4-6 hours. The recordings span two normal reading tasks and one task-specific reading task, resulting in a dataset that encompasses EEG and eye-tracking data of 21,629 words in 1107 sentences and 154,173 fixations. We believe that this dataset represents a valuable resource for natural language processing (NLP). The EEG and eye-tracking signals lend themselves to train improved machine-learning models for various tasks, in particular for information extraction tasks such as entity and relation extraction and sentiment analysis. Moreover, this dataset is useful for advancing research into the human reading and language understanding process at the level of brain activity and eye-movement.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1. Histogram of the reading speeds of all sentences for all three tasks.**

**Figure 2. Sample screens for a sentence of each task.**
(a) Task 1 (Sentiment). (b) Task 2 (Normal Reading). (c) Task 3 (Task-specific Reading).

**Figure 3. Visualization of single trial EEG and eye-tracking data.**
(a) Prototypical single sentence fixation data for a representative subject. Red crosses indicate fixations. Boxes around the words indicate the area in which fixations are allocated to the specific word. (b) Raw gaze data of the fixation data plotted above. (c) Subset of the raw EEG data during the sentence. Electrodes matching the 10–20 systems were chosen and for plotting purposes data were bandpass-filtered (0.5–30 Hz.). (d) Same data as in (c) after preprocessing.

**Figure 4. Omission rates and skipping proportions (means and standard errors) for all tasks and subjects (means and standard error).**
(a) The omission rates for each task and for each subject, where the y-axis shows the proportion of words being skipped in a sentence (0–1). (b) The skipping proportion (y-axis) for each task and for each subject.

**Figure 5. Effect of word length on the skipping proportion per task (mean and standard deviation), x-axis = word length, y-axis = mean skipping proportion.**

**Figure 6. Violin plot showing means, distributions, and ranges of the reading time measures per word for each task and each eye-tracking feature (x-axis) in milliseconds.**

**Figure 7. FRPs during the different task conditions with selected scalp level potential distributions.**
Topographies show amplitudes in microvolt, coded as color.

**Figure 8. Clustered EEG segments.**
(a) FRPs of electrode Cz, clustered by duration of the fixation. (b) Each horizontal line represents the mean of the current and 50 adjacent EEG epochs, segmented on fixation onset. Segments are ordered by fixation duration (top: shortest fixation, bottom: longest fixation). Color represents the amplitude of the signal in microvolt.

See this image and copyright information in PMC

References

Data Citations

1. Hollenstein N., et al. . 2018. Open Science Framework. https://doi.org/10.17605/OSF.IO/Q3ZWS - DOI

References

1. Barrett M., Bingel J., Keller F. & Søgaard A. Weakly supervised part-of-speech tagging using eye-tracking data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 2, 579–584 (2016).
1. Mishra A., Kanojia D., Nagar S., Dey K. & Bhattacharyya P. Leveraging cognitive features for sentiment analysis. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (2016).
1. Søgaard A. Evaluating word embeddings with fMRI and eye-tracking. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP 116–121 (2016).
1. Kennedy A. The Dundee Corpus. (University of Dundee, 2003).
1. Cop U., Dirix N., Drieghe D. & Duyck W. Presenting GECO: An eye-tracking corpus of monolingual and bilingual sentence reading. Behavior Research Methods 49, 602–615 (2016). - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

Affiliations

ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Data Citations

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources