Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 2;5(12):e14139.
doi: 10.1371/journal.pone.0014139.

Zipf's law leads to Heaps' law: analyzing their relation in finite-size systems

Affiliations

Zipf's law leads to Heaps' law: analyzing their relation in finite-size systems

Linyuan Lü et al. PLoS One. .

Abstract

Background: Zipf's law and Heaps' law are observed in disparate complex systems. Of particular interests, these two laws often appear together. Many theoretical models and analyses are performed to understand their co-occurrence in real systems, but it still lacks a clear picture about their relation.

Methodology/principal findings: We show that the Heaps' law can be considered as a derivative phenomenon if the system obeys the Zipf's law. Furthermore, we refine the known approximate solution of the Heaps' exponent provided the Zipf's exponent. We show that the approximate solution is indeed an asymptotic solution for infinite systems, while in the finite-size system the Heaps' exponent is sensitive to the system size. Extensive empirical analysis on tens of disparate systems demonstrates that our refined results can better capture the relation between the Zipf's and Heaps' exponents.

Conclusions/significance: The present analysis provides a clear picture about the relation between the Zipf's law and Heaps' law without the help of any specific stochastic model, namely the Heaps' law is indeed a derivative phenomenon from the Zipf's law. The presented numerical method gives considerably better estimation of the Heaps' exponent given the Zipf's exponent and the system size. Our analysis provides some insights and implications of real complex systems. For example, one can naturally obtained a better explanation of the accelerated growth of scale-free networks.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Relationship between the Heaps' exponent and Zipf's exponent .
The solid curve represents the asymptotic solution shown in Eq. 7, the dash curve is the numerical result based on Eq. 6, and the circles denote the result from the stochastic model. For the numerical result and the result of the stochastic model, the total number of word occurrences is fixed as formula image. The Heaps' exponents formula image for the numerical results of Eq. 6 and the simulation results of the stochastic model are obtained by using the least square method.
Figure 2
Figure 2. Effect of system size on the Heaps' exponent .
The Zipf's exponent is fixed as formula image.
Figure 3
Figure 3. Heaps' exponent as a function of .
Figure 4
Figure 4. Zipf's law and Heaps' law in four example systems.
(a) Words in Dante Alghieri's great book “La Divina Commedia” in Italian where formula image is the frequency of the word ranked formula image and formula image is the number of distinct words. (b) Keywords of articles published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS) where formula image is the frequency of the keyword ranked formula image and formula image is the number of distinct keywords; (c) Confirmed cases of the novel virus influenza A (H1N1) where formula image is the number of confirmed cases of the country ranked formula image and formula image is the number of infected country in the presence of formula image confirmed cases over the world; (d) PNAS articles having been cited at least once from 1915 to 2009 where formula image is the number of citations of the article ranked formula image and formula image is the number of distinct articles in the presence of formula image citations to PNAS. In (c), the data set is small and thus the effective number is only two digits. The fittings in (c1) and (c2) only cover the area marked by blue. In (d1), the deviation from a power law is observed in the head and tail, and thus the fitting only covers the blue area. The Zipf's (power-law) exponents and Heaps' exponents are obtained by using the maximum likelihood estimation , and least square method, respectively. Statistics of these data sets can be found in Table 1 (the data set numbers of (a), (b), (c) and (d) are 9, 10, 34 and 35 in Table 1) with detailed description in Materials and Methods .
Figure 5
Figure 5. Direct comparison between the empirical data and Eq. 6 as well as its improved version.
The left and right plots are for the words in “La Divina Commedia” and the keywords in PNAS. The blue dash lines and red solid lines present the results of Eq. 6 and Eq. 11, respectively. In accordance with Figure 4 and Table 1, the values of the parameter formula image are given as 1.117 and 0.893, respectively.
Figure 6
Figure 6. vs. according to the numerical results of Eq. 6.
The red, black and blue line correspond to the cases of formula image, formula image and formula image. The system sizes (i.e., the total number of word occurrences), from left to right, are formula image, formula image and formula image. Fitting exponent formula image is obtained by the least square method. The fitting lines and numerical results almost completely overlap.

Similar articles

Cited by

References

    1. Zipf GK. Human Behaviour and the Principle of Least Effort: An introduction to human ecology (Addison-Wesly, Cambridge) 1949.
    1. Heaps HS. Information Retrieval: Computational and Theoretical Aspects (Academic Press, Orlando) 1978.
    1. Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Rev. 2009;51:661–703.
    1. Axtell RL. Zipf Distribution of U.S. Firm Sizes. Science. 2001;293:1818–1820. - PubMed
    1. Drăgulescu A, Yakovenko VM. Exponential and power-law probability distributions of wealth and income in the United Kingdom and the United States. Physica A. 2001;299:213–221.

Publication types