Zipf's law in China's local government work reports: A 21-year study using natural language processing and regression analysis
- PMID: 40392917
- PMCID: PMC12091883
- DOI: 10.1371/journal.pone.0324713
Zipf's law in China's local government work reports: A 21-year study using natural language processing and regression analysis
Abstract
The examination and application of Zipf's law is a significant topic in quantitative linguistics. This study presents an in-depth empirical investigation of this law in 651 Chinese provincial government work reports (2003-2023). Employing natural language processing techniques (including Jieba word segmentation with a custom dictionary) and a double-logarithmic regression model, we analyzed word frequency distributions. Our findings indicate that the Zipf coefficient in these reports is close to 1, confirming general adherence to Zipf's law. Over the 21-year period, the Zipf coefficient exhibits fluctuations, with a notable inflection point in 2011, after which it follows a consistent upward trend. This shift is likely influenced by the 18th National Congress of the Communist Party of China, which marked a transition toward more standardized and centralized policy communication. While regional differences among eastern, central, western, and northeastern provinces are minimal, centrally governed municipalities exhibit higher Zipf coefficients than other provincial-level regions. Although our findings largely confirm the applicability of Zipf's Law to this specific corpus, this study is limited by the exclusion of prefecture- and county-level reports. Future research can address this limitation by incorporating a broader range of administrative levels and by conducting cross-country and cultural comparisons of political documents. Further investigation of alternate quantitative linguistic laws (e.g., Heaps' Law, Menzerath's Law) within this corpus is also warranted.
Copyright: © 2025 Yanfang LI. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures







Similar articles
-
Zipf's law revisited: Spoken dialog, linguistic units, parameters, and the principle of least effort.Psychon Bull Rev. 2023 Feb;30(1):77-101. doi: 10.3758/s13423-022-02142-9. Epub 2022 Jul 15. Psychon Bull Rev. 2023. PMID: 35840837 Free PMC article. Review.
-
Zipf's law leads to Heaps' law: analyzing their relation in finite-size systems.PLoS One. 2010 Dec 2;5(12):e14139. doi: 10.1371/journal.pone.0014139. PLoS One. 2010. PMID: 21152034 Free PMC article.
-
Deviation of Zipf's and Heaps' Laws in human languages with limited dictionary sizes.Sci Rep. 2013;3:1082. doi: 10.1038/srep01082. Epub 2013 Jan 30. Sci Rep. 2013. PMID: 23378896 Free PMC article.
-
Do bats' social vocalizations conform to Zipf's law and the Menzerath-Altmann law?iScience. 2024 Jun 28;27(7):110401. doi: 10.1016/j.isci.2024.110401. eCollection 2024 Jul 19. iScience. 2024. PMID: 39104571 Free PMC article.
-
Zipf's word frequency law in natural language: a critical review and future directions.Psychon Bull Rev. 2014 Oct;21(5):1112-30. doi: 10.3758/s13423-014-0585-6. Psychon Bull Rev. 2014. PMID: 24664880 Free PMC article. Review.
References
-
- Zipf G. Psychobiology of Language—An Introduction to Dynamic Philology. London: Routledge; 1999.
-
- Zipf G. Human behavior and the principle of least effort: an introduction to human ecology. Cambridge: Addison-Wesley. 1949.
-
- Debowski L. Information theory meets power laws: stochastic processes and language models. John Wiley & Sons. 2020.
-
- Ferrer-i-Cancho R, Bentz C, Seguin C. Optimal Coding and the Origins of Zipfian Laws. J Quantitative Linguistics. 2020;29(2):165–94. doi: 10.1080/09296174.2020.1778387 - DOI
MeSH terms
LinkOut - more resources
Full Text Sources