SUBTLEX-CH: Chinese word and character frequencies based on film subtitles
- PMID: 20532192
- PMCID: PMC2880003
- DOI: 10.1371/journal.pone.0010729
SUBTLEX-CH: Chinese word and character frequencies based on film subtitles
Abstract
Background: Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.
Methodology: Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.
Conclusions: Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.
Conflict of interest statement
Figures




Similar articles
-
SUBTLEX-NL: a new measure for Dutch word frequency based on film subtitles.Behav Res Methods. 2010 Aug;42(3):643-50. doi: 10.3758/BRM.42.3.643. Behav Res Methods. 2010. PMID: 20805586
-
SUBTLEX-CY: A new word frequency database for Welsh.Q J Exp Psychol (Hove). 2024 May;77(5):1052-1067. doi: 10.1177/17470218231190315. Epub 2023 Aug 30. Q J Exp Psychol (Hove). 2024. PMID: 37649366 Free PMC article.
-
SUBTLEX-UK: a new and improved word frequency database for British English.Q J Exp Psychol (Hove). 2014;67(6):1176-90. doi: 10.1080/17470218.2013.850521. Epub 2014 Jan 13. Q J Exp Psychol (Hove). 2014. PMID: 24417251
-
On the advantages of word frequency and contextual diversity measures extracted from subtitles: The case of Portuguese.Q J Exp Psychol (Hove). 2015;68(4):680-96. doi: 10.1080/17470218.2014.964271. Epub 2014 Nov 7. Q J Exp Psychol (Hove). 2015. PMID: 25263599
-
The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German.Exp Psychol. 2011;58(5):412-24. doi: 10.1027/1618-3169/a000123. Exp Psychol. 2011. PMID: 21768069 Review.
Cited by
-
Discovering the structure and organization of a free Cantonese emotion-label word association graph to understand mental lexicons of emotions.Sci Rep. 2022 Nov 15;12(1):19581. doi: 10.1038/s41598-022-23995-z. Sci Rep. 2022. PMID: 36380119 Free PMC article.
-
Parafoveal preview benefit in a conflicting sentential context: Evidence from ERPs.Front Psychol. 2022 Nov 15;13:1063923. doi: 10.3389/fpsyg.2022.1063923. eCollection 2022. Front Psychol. 2022. PMID: 36457924 Free PMC article.
-
The Relationship between Dispositional Mindfulness and Relative Accuracy of Judgments of Learning: The Moderating Role of Test Anxiety.J Intell. 2023 Jul 4;11(7):132. doi: 10.3390/jintelligence11070132. J Intell. 2023. PMID: 37504775 Free PMC article.
-
Electrophysiological dynamics of Chinese phonology during visual word recognition in Chinese-English bilinguals.Sci Rep. 2018 May 2;8(1):6869. doi: 10.1038/s41598-018-25072-w. Sci Rep. 2018. PMID: 29720729 Free PMC article.
-
LexCHI: A quick lexical test for estimating language proficiency in Chinese.Behav Res Methods. 2024 Mar;56(3):2333-2352. doi: 10.3758/s13428-023-02151-z. Epub 2023 Jul 5. Behav Res Methods. 2024. PMID: 37407785 Free PMC article.
References
-
- Perfetti CA, Tan LH. The time course of graphic, phonological, and semantic activation in Chinese character identification. J Exp Psychol Learn Mem Cogn. 1998;24:101–118. - PubMed
-
- Wong K, Li W, Xu R, Zhang Z. San Rafael, California: Morgan & Claypool Publishers; 2010. Introduction to Chinese Natural Language Processing. DOI: 10.2200/S00211ED1V01Y200909HLT004. - DOI
-
-
Language Teaching and Research Institute of Beijing Language Institute. Beijing: Beijing Language Institute Press; 1986.
(Dictionary of Modern Chinese Frequency) (in Chinese).
-
Language Teaching and Research Institute of Beijing Language Institute. Beijing: Beijing Language Institute Press; 1986.
-
-
Liu Y, et al., editors. Beijing: Yuhang Publishing House; 1990.
(Dictionary of Modern Chinese words in common uses) (in Chinese).
-
Liu Y, et al., editors. Beijing: Yuhang Publishing House; 1990.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources