CCLOWW: A grade-level Chinese children's lexicon of written words
- PMID: 35776384
- DOI: 10.3758/s13428-022-01890-9
CCLOWW: A grade-level Chinese children's lexicon of written words
Abstract
In this article, we present the Chinese Children's Lexicon of Written Words (CCLOWW), the first grade-level database that provides frequency statistics of simplified Chinese characters and words for children. The database computes from a corpus of 34,671,424 character tokens and 22,427,010 word tokens (including single- and multicharacter words), extracted from 2131 books. It contains 6746 different character types and 153,079 different word types. CCLOWW provides several frequency indices of simplified Chinese for three grade levels (grade 2 and below, grades 3-4, grades 5-6) to profile children's experience with written Chinese in and outside of school. We describe in this article the distributions of frequency and contextual diversity of the characters and words, as well as word length and syntactic categories of the words in the corpus and the subcorpora. We also report results of correlation analyses with other written corpora and of several naming and lexicon decision experiments. The findings suggest that CCLOWW frequency measures correlate well with other corpora. Importantly, they could reliably predict children's and adults' naming and lexical decision performances. They could also explain variance in adults' visual word recognition, in addition to frequency measures computed in an adult corpus, indicating that early print exposure might influence readers' lexical processing later on beyond an age of acquisition effect. CCLOWW will help researchers in language processing and development as well as educators with selecting language materials appropriate for children's developmental stages. The database is freely available online at https://www.learn2read.cn/database/ .
Keywords: Children; Chinese; Contextual diversity; Frequency; Lexical database.
© 2022. The Psychonomic Society, Inc.
Similar articles
-
CCLOOW: Chinese children's lexicon of oral words.Behav Res Methods. 2024 Feb;56(2):846-859. doi: 10.3758/s13428-023-02077-6. Epub 2023 Mar 7. Behav Res Methods. 2024. PMID: 36881355
-
A large-scale database of Chinese characters and words collected from elementary school textbooks.Behav Res Methods. 2024 Aug;56(5):4732-4757. doi: 10.3758/s13428-023-02214-1. Epub 2023 Aug 24. Behav Res Methods. 2024. PMID: 37620745
-
The Children's Picture Books Lexicon (CPB-LEX): A large-scale lexical database from children's picture books.Behav Res Methods. 2024 Aug;56(5):4504-4521. doi: 10.3758/s13428-023-02198-y. Epub 2023 Aug 11. Behav Res Methods. 2024. PMID: 37566336 Free PMC article.
-
SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.PLoS One. 2010 Jun 2;5(6):e10729. doi: 10.1371/journal.pone.0010729. PLoS One. 2010. PMID: 20532192 Free PMC article.
-
The ubiquity of frequency effects in first language acquisition.J Child Lang. 2015 Mar;42(2):239-73. doi: 10.1017/S030500091400049X. J Child Lang. 2015. PMID: 25644408 Free PMC article. Review.
Cited by
-
The Children and Young People's Books Lexicon (CYP-LEX): A large-scale lexical database of books read by children and young people in the United Kingdom.Q J Exp Psychol (Hove). 2024 Dec;77(12):2418-2438. doi: 10.1177/17470218241229694. Epub 2024 Mar 12. Q J Exp Psychol (Hove). 2024. PMID: 38262912 Free PMC article.
-
VOC-ADO: A lexical database for French-speaking adolescents.Behav Res Methods. 2025 Apr 2;57(5):137. doi: 10.3758/s13428-025-02656-9. Behav Res Methods. 2025. PMID: 40175669
-
Behavioral observation and assessment protocol for language and social-emotional development study in children aged 0-6: the Chinese baby connectome project.BMC Psychol. 2024 Oct 4;12(1):533. doi: 10.1186/s40359-024-02031-x. BMC Psychol. 2024. PMID: 39367488 Free PMC article.
-
A large-scale database of Mandarin Chinese word associations from the Small World of Words Project.Behav Res Methods. 2024 Dec 30;57(1):34. doi: 10.3758/s13428-024-02513-1. Behav Res Methods. 2024. PMID: 39739205
-
Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data.Behav Res Methods. 2025 May 8;57(6):166. doi: 10.3758/s13428-025-02697-0. Behav Res Methods. 2025. PMID: 40341999
References
-
- Adelman, J., Brown, G., & Quesada, J. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 19(9), 814–823. - DOI
-
- Bai, X., Yan, G., Liversedge, S., & Zang, C. (2008). Reading spaced and unspaced Chinese text: Evidence from eye movements. Journal of Experimental Psychology: Human Perception and Performance, 34(5), 1277–1287. - PubMed
-
- Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., … Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459. https://doi.org/10.3758/BF03193014
-
- Barr, D., Levy, R., Scheepers, C., & Tily, H. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. - DOI
MeSH terms
LinkOut - more resources
Full Text Sources