Unzipping Zipf's law

doi:10.1371/journal.pone.0181987

. 2017 Aug 9;12(8):e0181987.

doi: 10.1371/journal.pone.0181987. eCollection 2017.

Unzipping Zipf's law

Sander Lestrade¹

Affiliations

PMID: 28792963
PMCID: PMC5549924
DOI: 10.1371/journal.pone.0181987

Unzipping Zipf's law

Sander Lestrade. PLoS One. 2017.

. 2017 Aug 9;12(8):e0181987.

doi: 10.1371/journal.pone.0181987. eCollection 2017.

Author

Sander Lestrade¹

Affiliation

¹ Centre for Language Studies, Radboud University, Nijmegen, The Netherlands.

PMID: 28792963
PMCID: PMC5549924
DOI: 10.1371/journal.pone.0181987

Abstract

In spite of decades of theorizing, the origins of Zipf's law remain elusive. I propose that a Zipfian distribution straightforwardly follows from the interaction of syntax (word classes differing in class size) and semantics (words having to be sufficiently specific to be distinctive and sufficiently general to be reusable). These factors are independently motivated and well-established ingredients of a natural-language system. Using a computational model, it is shown that neither of these ingredients suffices to produce a Zipfian distribution on its own and that the results deviate from the Zipfian ideal only in the same way as natural language itself does.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The author has declared that no competing interests exist.

Figures

**Fig 1. Zipf’s law.**
A: Predicted frequency by rank. B: Predicted frequency by rank in double-log space. C: Frequency development in Melville’s *Moby Dick*.

**Fig 2. Attempt to generate a Zipfian distribution with syntax only.**
To generate these results, the class frequencies and class sizes reported for Dutch in Table 1 are used. Numbers correspond to word classes when ordered by expected frequency.

**Fig 3. Frequency distributions of different specifity classes in the Brown corpus.**
Top panel: distribution over overall distribution of nouns. Degree of meaning specification is approximated by automatically determining the depth of embedding in the WordNet noun taxonomy. Words with lowest ranks are all moderately specified with an embedding of 3–9 (red circles). Bottom panel: boxplots of frequency ranks per specificity class.

**Fig 4. Frequency distribution of different specificity classes in a computer simulation.**
The lexicon consists of 1,000 words with ten optional meaning dimensions, from which words are selected for 10,000 contexts with randomly generated targets and 5 randomly generated distractors. Words with lowest ranks are all moderately specified (2–4 dimensions; red circles). Bottom panel: boxplots of frequency ranks per specificity class.

**Fig 5. Distribution of probability of usage of different specificity classes in a computational model.**
The lexicon consists of 1,000 words with ten optional meaning dimensions. Probability of usage depends on degree of specification and number of distractors assumed (here 5). As in the previous figures, words with lowest ranks are all moderately specified (3–6 dimensions; red circles). Bottom panel: boxplots of frequency ranks per specificity class.

**Fig 6. Generating Zipf’s law by combining syntax and semantics.**
10 word classes of equal frequency are used with 5, 30, 50, 100, 500, 500, 1,000, 15,000, 25,000, and 100,000 members; items can be specified for maximally 30 meaning dimensions (mean 8.3, sd 2.0), and the number of distractors is 5.

**Fig 7. Frequency distribution in CGN (left) and Brown corpus (right).**
Blue triangles show the results of the model simulation using the corresponding parameters from Table 1; red plusses show the results when mixing the CGN and Brown parameters.

See this image and copyright information in PMC

Cited by

Dynamical analogues of rank distributions.
Velarde C, Robledo A. Velarde C, et al. PLoS One. 2019 Feb 4;14(2):e0211226. doi: 10.1371/journal.pone.0211226. eCollection 2019. PLoS One. 2019. PMID: 30716119 Free PMC article.
Zipfian Distributions in Child-Directed Speech.
Lavi-Rotbain O, Arnon I. Lavi-Rotbain O, et al. Open Mind (Camb). 2023 Jan 24;7:1-30. doi: 10.1162/opmi_a_00070. eCollection 2023. Open Mind (Camb). 2023. PMID: 36891353 Free PMC article.
Zipf's law in China's local government work reports: A 21-year study using natural language processing and regression analysis.
Li Y. Li Y. PLoS One. 2025 May 20;20(5):e0324713. doi: 10.1371/journal.pone.0324713. eCollection 2025. PLoS One. 2025. PMID: 40392917 Free PMC article.
Nucleotide spacing distribution analysis for human genome.
Górski AZ, Piwowar M. Górski AZ, et al. Mamm Genome. 2021 Apr;32(2):123-128. doi: 10.1007/s00335-021-09865-5. Epub 2021 Mar 15. Mamm Genome. 2021. PMID: 33723659 Free PMC article.
Scaling Laws for Phonotactic Complexity in Spoken English Language Data.
Baumann A, Kaźmierski K, Matzinger T. Baumann A, et al. Lang Speech. 2021 Sep;64(3):693-704. doi: 10.1177/0023830920944445. Epub 2020 Aug 1. Lang Speech. 2021. PMID: 32744167 Free PMC article.

See all "Cited by" articles

References

1. Zipf GK. Human behavior and the principle of least effort An introduction to human ecology. New York and London: Hafner publishing company; 1949.
1. Mitzenmacher M. A brief history of generative models for power law and lognormal distributions. Internet mathematics. 2004;1(2):226–251. 10.1080/15427951.2004.10129088 - DOI
1. Montemurro MA. Beyond the Zipf-Mandelbrot law in quantitative linguistics. Physica A. 2001;300:567–578. 10.1016/S0378-4371(01)00355-7 - DOI
1. Pustet R. Zipf and his heirs. Language Sciences. 2004;26:1–25. 10.1016/S0388-0001(03)00018-4 - DOI
1. Kello CT, Brown GDA, Ferrer-i-Cancho R, Golden JG, Linkenkaer-Hansen K, Rhodes T, et al. Scaling laws in cognitive sciences. Trends in Cognitive Sciences. 2010;14(5):223–232. 10.1016/j.tics.2010.02.005 - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Zipf GK. Human behavior and the principle of least effort An introduction to human ecology. New York and London: Hafner publishing company; 1949.

[2] Zipf GK. Human behavior and the principle of least effort An introduction to human ecology. New York and London: Hafner publishing company; 1949.

[3] Mitzenmacher M. A brief history of generative models for power law and lognormal distributions. Internet mathematics. 2004;1(2):226–251. 10.1080/15427951.2004.10129088 - DOI

[4] Mitzenmacher M. A brief history of generative models for power law and lognormal distributions. Internet mathematics. 2004;1(2):226–251. 10.1080/15427951.2004.10129088 - DOI

[5] Montemurro MA. Beyond the Zipf-Mandelbrot law in quantitative linguistics. Physica A. 2001;300:567–578. 10.1016/S0378-4371(01)00355-7 - DOI

[6] Montemurro MA. Beyond the Zipf-Mandelbrot law in quantitative linguistics. Physica A. 2001;300:567–578. 10.1016/S0378-4371(01)00355-7 - DOI

[7] Pustet R. Zipf and his heirs. Language Sciences. 2004;26:1–25. 10.1016/S0388-0001(03)00018-4 - DOI

[8] Pustet R. Zipf and his heirs. Language Sciences. 2004;26:1–25. 10.1016/S0388-0001(03)00018-4 - DOI

[9] Kello CT, Brown GDA, Ferrer-i-Cancho R, Golden JG, Linkenkaer-Hansen K, Rhodes T, et al. Scaling laws in cognitive sciences. Trends in Cognitive Sciences. 2010;14(5):223–232. 10.1016/j.tics.2010.02.005 - DOI - PubMed

[10] Kello CT, Brown GDA, Ferrer-i-Cancho R, Golden JG, Linkenkaer-Hansen K, Rhodes T, et al. Scaling laws in cognitive sciences. Trends in Cognitive Sciences. 2010;14(5):223–232. 10.1016/j.tics.2010.02.005 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Unzipping Zipf's law

Affiliation

Unzipping Zipf's law

Author

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources