Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar 9;5(3):e9411.
doi: 10.1371/journal.pone.0009411.

Random texts do not exhibit the real Zipf's law-like rank distribution

Affiliations

Random texts do not exhibit the real Zipf's law-like rank distribution

Ramon Ferrer-I-Cancho et al. PLoS One. .

Abstract

Background: Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,...) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random characters including blanks behaving as word delimiters - exhibit a Zipf's law-like word rank distribution.

Methodology/principal findings: In this article, we examine the flaws of such putative good fits of random texts. We demonstrate - by means of three different statistical tests - that ranks derived from random texts and ranks derived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text. Our findings are valid for both the simplest random texts composed of equally likely characters as well as more elaborate and realistic versions where character probabilities are borrowed from a real text.

Conclusions/significance: The good fit of random texts to real Zipf's law-like rank distributions has not yet been established. Therefore, we suggest that Zipf's law might in fact be a fundamental law in natural languages.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The rank histograms of English texts versus that of random texts ().
A comparison of the real rank histogram (thin black line) and two control curves with the formula image upper and lower bounds of the expected histogram of a random text of the same length in words (dashed lines) involving four English texts. formula image is the frequency of the word of rank formula image. For the random text we use the model formula image with alphabet size formula image. The expected histogram of the random text is estimated averaging over the rank histograms of formula image random texts. For ease of presentation, the expected histogram is cut off at expected frequencies below formula image. AAW: Alice's adventures in wonderland. H: Hamlet. DC: David Crockett. OS: The origin of species.
Figure 2
Figure 2. The rank histograms of English texts versus that of random texts ().
The same as Fig. 1 for the model formula image with alphabet size formula image and probability of blank formula image obtained from the real text.
Figure 3
Figure 3. The rank histograms of English texts versus that of random texts ().
The same as Fig. 1 for the model formula image with alphabet size formula image and character probabilities obtained from the real text.

Similar articles

Cited by

References

    1. Zipf GK. Human behaviour and the principle of least effort. An introduction to human ecology. New York: Hafner reprint. 1st edition: Cambridge, MA: Addison-Wesley, 1949; 1972.
    1. Li W. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE T Inform Theory. 1992;38:1842–1845.
    1. Rapoport A. Zipf's law re-visited. Quantitative Linguistics. 1982;16:1–28.
    1. Miller GA, Chomsky N. Finitary models of language users. In: Luce RD, Bush R, Galanter E, editors. Handbook of Mathematical Psychology. New York: Wiley, volume 2; 1963. pp. 419–491.
    1. Miller GA. Some effects of intermittent silence. Am J Psychol. 1957;70:311–314. - PubMed

Publication types