Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 20;20(5):e0324713.
doi: 10.1371/journal.pone.0324713. eCollection 2025.

Zipf's law in China's local government work reports: A 21-year study using natural language processing and regression analysis

Affiliations

Zipf's law in China's local government work reports: A 21-year study using natural language processing and regression analysis

Yanfang Li. PLoS One. .

Abstract

The examination and application of Zipf's law is a significant topic in quantitative linguistics. This study presents an in-depth empirical investigation of this law in 651 Chinese provincial government work reports (2003-2023). Employing natural language processing techniques (including Jieba word segmentation with a custom dictionary) and a double-logarithmic regression model, we analyzed word frequency distributions. Our findings indicate that the Zipf coefficient in these reports is close to 1, confirming general adherence to Zipf's law. Over the 21-year period, the Zipf coefficient exhibits fluctuations, with a notable inflection point in 2011, after which it follows a consistent upward trend. This shift is likely influenced by the 18th National Congress of the Communist Party of China, which marked a transition toward more standardized and centralized policy communication. While regional differences among eastern, central, western, and northeastern provinces are minimal, centrally governed municipalities exhibit higher Zipf coefficients than other provincial-level regions. Although our findings largely confirm the applicability of Zipf's Law to this specific corpus, this study is limited by the exclusion of prefecture- and county-level reports. Future research can address this limitation by incorporating a broader range of administrative levels and by conducting cross-country and cultural comparisons of political documents. Further investigation of alternate quantitative linguistic laws (e.g., Heaps' Law, Menzerath's Law) within this corpus is also warranted.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Technical roadmap.
Fig 2
Fig 2. Temporal trends of Zipf coefficients by all provincial-level regions (2003–2023).
Fig 3
Fig 3. Histogram and kernel density estimation of Zipf coefficients.
Fig 4
Fig 4. The evolution of the annual average value of Zipf coefficients(2003–2023).
Fig 5
Fig 5. Zipf Coefficient Distribution Across Quintiles (2003–2023).
Fig 6
Fig 6. Zipf Coefficient Variations Across China’s Four Regions (2003–2023).
Fig 7
Fig 7. Temporal variation of Zipf coefficients for municipalities and other provincial-level regions (2003–2023).

Similar articles

References

    1. Zipf G. Psychobiology of Language—An Introduction to Dynamic Philology. London: Routledge; 1999.
    1. Zipf G. Human behavior and the principle of least effort: an introduction to human ecology. Cambridge: Addison-Wesley. 1949.
    1. Debowski L. Information theory meets power laws: stochastic processes and language models. John Wiley & Sons. 2020.
    1. Dębowski Ł, Bentz C. Information Theory and Language. Entropy (Basel). 2020;22(4):435. doi: 10.3390/e22040435 - DOI - PMC - PubMed
    1. Ferrer-i-Cancho R, Bentz C, Seguin C. Optimal Coding and the Origins of Zipfian Laws. J Quantitative Linguistics. 2020;29(2):165–94. doi: 10.1080/09296174.2020.1778387 - DOI

LinkOut - more resources