A large-scale database of Mandarin Chinese word associations from the Small World of Words Project
- PMID: 39739205
- DOI: 10.3758/s13428-024-02513-1
A large-scale database of Mandarin Chinese word associations from the Small World of Words Project
Erratum in
-
Correction: A large-scale database of Mandarin Chinese word associations from the Small World of Words Project.Behav Res Methods. 2025 Feb 7;57(3):91. doi: 10.3758/s13428-025-02603-8. Behav Res Methods. 2025. PMID: 39920452 No abstract available.
Abstract
Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.
Keywords: Chinese; Mental lexicon; Semantic network; Word association.
© 2024. The Psychonomic Society, Inc.
Conflict of interest statement
Declarations. Ethics approval: The study was approved by the KU Leuven Ethics Committee (G-201407017). Consent to participate: All participants provided online informed consent via checkbox. Consent for publication: All participants involved in this study provided online informed consent for the publication of the data and findings derived from this research. Participants were assured that their identities would remain confidential and stored anonymously. Open practices statement: The data and materials for all experiments are available at https://smallworldofwords.org/zh/project/research and none of the experiments was preregistered. Competing interests: The authors have declared no competing interests.
Similar articles
-
The "Small World of Words" free association norms for Rioplatense Spanish.Behav Res Methods. 2024 Feb;56(2):968-985. doi: 10.3758/s13428-023-02070-z. Epub 2023 Mar 15. Behav Res Methods. 2024. PMID: 36922451 Free PMC article.
-
A database of orthography-semantics consistency (OSC) estimates for 15,017 English words.Behav Res Methods. 2018 Aug;50(4):1482-1495. doi: 10.3758/s13428-018-1017-8. Behav Res Methods. 2018. PMID: 29372490
-
Lexical effects on spoken word recognition performance among Mandarin-speaking children with normal hearing and cochlear implants.Int J Pediatr Otorhinolaryngol. 2010 Aug;74(8):883-90. doi: 10.1016/j.ijporl.2010.05.005. Int J Pediatr Otorhinolaryngol. 2010. PMID: 20846499
-
A database of 629 English compound words: ratings of familiarity, lexeme meaning dominance, semantic transparency, age of acquisition, imageability, and sensory experience.Behav Res Methods. 2015 Dec;47(4):1004-1019. doi: 10.3758/s13428-014-0523-6. Behav Res Methods. 2015. PMID: 25361864
-
Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data.Behav Res Methods. 2025 May 8;57(6):166. doi: 10.3758/s13428-025-02697-0. Behav Res Methods. 2025. PMID: 40341999
Cited by
-
Distinct components of Stroop interference and facilitation: The role of phonology and response modality.Q J Exp Psychol (Hove). 2025 May;78(5):997-1015. doi: 10.1177/17470218241302490. Epub 2024 Dec 20. Q J Exp Psychol (Hove). 2025. PMID: 39534943 Free PMC article.
References
-
- Abbott, J. T., Austerweil, J. L., & Griffiths, T. L. (2015). Random walks on semantic networks can resemble optimal foraging. Psychological Review, 122(5), 558–569. https://doi.org/10.1037/a0038693 - DOI - PubMed
-
- Adelman, J. S., & Brown, G. D. A. (2008). Modeling lexical decision: The form of frequency and diversity effects. Psychological Review, 115(1), 214–227. https://doi.org/10.1037/0033-295X.115.1.214 - DOI - PubMed
-
- Auguste, J., Rey, A., & Favre, B. (2017). Evaluation of word embeddings against cognitive processes: Primed reaction times in lexical decision and naming tasks. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP (pp. 21–26). https://doi.org/10.18653/v1/W17-5304
-
- Baayen, R. H. (2001). Word frequency distributions. Springer. - DOI
-
- Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition ofsingle-syllable words. Journal of Experimental Psychology: General, 133(2), 283–316. https://doi.org/10.1037/0096-3445.133.2.283 - DOI - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources