A large-scale database of Mandarin Chinese word associations from the Small World of Words Project
- PMID: 39739205
- DOI: 10.3758/s13428-024-02513-1
A large-scale database of Mandarin Chinese word associations from the Small World of Words Project
Erratum in
-
Correction: A large-scale database of Mandarin Chinese word associations from the Small World of Words Project.Behav Res Methods. 2025 Feb 7;57(3):91. doi: 10.3758/s13428-025-02603-8. Behav Res Methods. 2025. PMID: 39920452 No abstract available.
Abstract
Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.
Keywords: Chinese; Mental lexicon; Semantic network; Word association.
© 2024. The Psychonomic Society, Inc.
Conflict of interest statement
Declarations. Ethics approval: The study was approved by the KU Leuven Ethics Committee (G-201407017). Consent to participate: All participants provided online informed consent via checkbox. Consent for publication: All participants involved in this study provided online informed consent for the publication of the data and findings derived from this research. Participants were assured that their identities would remain confidential and stored anonymously. Open practices statement: The data and materials for all experiments are available at https://smallworldofwords.org/zh/project/research and none of the experiments was preregistered. Competing interests: The authors have declared no competing interests.
References
-
- Abbott, J. T., Austerweil, J. L., & Griffiths, T. L. (2015). Random walks on semantic networks can resemble optimal foraging. Psychological Review, 122(5), 558–569. https://doi.org/10.1037/a0038693 - DOI - PubMed
-
- Adelman, J. S., & Brown, G. D. A. (2008). Modeling lexical decision: The form of frequency and diversity effects. Psychological Review, 115(1), 214–227. https://doi.org/10.1037/0033-295X.115.1.214 - DOI - PubMed
-
- Auguste, J., Rey, A., & Favre, B. (2017). Evaluation of word embeddings against cognitive processes: Primed reaction times in lexical decision and naming tasks. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP (pp. 21–26). https://doi.org/10.18653/v1/W17-5304
-
- Baayen, R. H. (2001). Word frequency distributions. Springer. - DOI
-
- Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition ofsingle-syllable words. Journal of Experimental Psychology: General, 133(2), 283–316. https://doi.org/10.1037/0096-3445.133.2.283 - DOI - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
