Random texts do not exhibit the real Zipf's law-like rank distribution
- PMID: 20231884
- PMCID: PMC2834740
- DOI: 10.1371/journal.pone.0009411
Random texts do not exhibit the real Zipf's law-like rank distribution
Abstract
Background: Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,...) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random characters including blanks behaving as word delimiters - exhibit a Zipf's law-like word rank distribution.
Methodology/principal findings: In this article, we examine the flaws of such putative good fits of random texts. We demonstrate - by means of three different statistical tests - that ranks derived from random texts and ranks derived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text. Our findings are valid for both the simplest random texts composed of equally likely characters as well as more elaborate and realistic versions where character probabilities are borrowed from a real text.
Conclusions/significance: The good fit of random texts to real Zipf's law-like rank distributions has not yet been established. Therefore, we suggest that Zipf's law might in fact be a fundamental law in natural languages.
Conflict of interest statement
Figures















Similar articles
-
Zipf's word frequency law in natural language: a critical review and future directions.Psychon Bull Rev. 2014 Oct;21(5):1112-30. doi: 10.3758/s13423-014-0585-6. Psychon Bull Rev. 2014. PMID: 24664880 Free PMC article. Review.
-
Large-Scale Analysis of Zipf's Law in English Texts.PLoS One. 2016 Jan 22;11(1):e0147073. doi: 10.1371/journal.pone.0147073. eCollection 2016. PLoS One. 2016. PMID: 26800025 Free PMC article.
-
Zipf's Law for Word Frequencies: Word Forms versus Lemmas in Long Texts.PLoS One. 2015 Jul 9;10(7):e0129031. doi: 10.1371/journal.pone.0129031. eCollection 2015. PLoS One. 2015. PMID: 26158787 Free PMC article.
-
Zipf's law revisited: Spoken dialog, linguistic units, parameters, and the principle of least effort.Psychon Bull Rev. 2023 Feb;30(1):77-101. doi: 10.3758/s13423-022-02142-9. Epub 2022 Jul 15. Psychon Bull Rev. 2023. PMID: 35840837 Free PMC article. Review.
-
Zipf's Law and Avoidance of Excessive Synonymy.Cogn Sci. 2008 Oct;32(7):1075-98. doi: 10.1080/03640210802020003. Cogn Sci. 2008. PMID: 21585444
Cited by
-
Zipf's word frequency law in natural language: a critical review and future directions.Psychon Bull Rev. 2014 Oct;21(5):1112-30. doi: 10.3758/s13423-014-0585-6. Psychon Bull Rev. 2014. PMID: 24664880 Free PMC article. Review.
-
Large-Scale Analysis of Zipf's Law in English Texts.PLoS One. 2016 Jan 22;11(1):e0147073. doi: 10.1371/journal.pone.0147073. eCollection 2016. PLoS One. 2016. PMID: 26800025 Free PMC article.
-
A scaling law for random walks on networks.Nat Commun. 2014 Oct 14;5:5121. doi: 10.1038/ncomms6121. Nat Commun. 2014. PMID: 25311870 Free PMC article.
-
Zipf's law in short-time timbral codings of speech, music, and environmental sound signals.PLoS One. 2012;7(3):e33993. doi: 10.1371/journal.pone.0033993. Epub 2012 Mar 29. PLoS One. 2012. PMID: 22479497 Free PMC article.
-
Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis.PLoS One. 2013 Jun 21;8(6):e66344. doi: 10.1371/journal.pone.0066344. Print 2013. PLoS One. 2013. PMID: 23805215 Free PMC article.
References
-
- Zipf GK. Human behaviour and the principle of least effort. An introduction to human ecology. New York: Hafner reprint. 1st edition: Cambridge, MA: Addison-Wesley, 1949; 1972.
-
- Li W. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE T Inform Theory. 1992;38:1842–1845.
-
- Rapoport A. Zipf's law re-visited. Quantitative Linguistics. 1982;16:1–28.
-
- Miller GA, Chomsky N. Finitary models of language users. In: Luce RD, Bush R, Galanter E, editors. Handbook of Mathematical Psychology. New York: Wiley, volume 2; 1963. pp. 419–491.
-
- Miller GA. Some effects of intermittent silence. Am J Psychol. 1957;70:311–314. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources