Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 28;112(17):5348-53.
doi: 10.1073/pnas.1420946112. Epub 2015 Apr 13.

Understanding scaling through history-dependent processes with collapsing sample space

Affiliations

Understanding scaling through history-dependent processes with collapsing sample space

Bernat Corominas-Murtra et al. Proc Natl Acad Sci U S A. .

Abstract

History-dependent processes are ubiquitous in natural and social systems. Many such stochastic processes, especially those that are associated with complex systems, become more constrained as they unfold, meaning that their sample space, or their set of possible outcomes, reduces as they age. We demonstrate that these sample-space-reducing (SSR) processes necessarily lead to Zipf's law in the rank distributions of their outcomes. We show that by adding noise to SSR processes the corresponding rank distributions remain exact power laws, p(x) ~ x(-λ), where the exponent directly corresponds to the mixing ratio of the SSR process and noise. This allows us to give a precise meaning to the scaling exponent in terms of the degree to which a given process reduces its sample space as it unfolds. Noisy SSR processes further allow us to explain a wide range of scaling exponents in frequency distributions ranging from α = 2 to ∞. We discuss several applications showing how SSR processes can be used to understand Zipf's law in word frequencies, and how they are related to diffusion processes in directed networks, or aging processes such as in fragmentation processes. SSR processes provide a new alternative to understand the origin of scaling in complex systems without the recourse to multiplicative, preferential, or self-organized critical processes.

Keywords: Zipf’s law; network diffusion; path dependence; random walks; scaling laws.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
SSR process. Imagine a set of N=20 dice with different numbers of faces. We start by throwing the 20-faced dice (icosahedron). Suppose we get a face value of 13. We now have to take the 12-faced dice (dodecahedron), throw it, and get a face value of, say, 9, so that we must continue with the 8-faced dice. Say we throw a 7, forcing us to take the (ordinary) dice, with which we throw, say, a 5. With the 4-faced dice we get a 2, which forces us to take the 2-faced dice (coin). The process ends when we throw a 1 for the first time. The set of possible outcomes (sample space) reduces as the process unfolds. The sequence above was chosen to make use of the platonic dice for pictorial reasons only. If the process is repeated many times, the distribution of face values (rank-ordered) gives Zipf’s law.
Fig. 2.
Fig. 2.
Illustration of path dependence, SSR, and nestedness of sample space. (Left) Unconstrained (iid) random walk ϕR realized by a ball randomly bouncing between all possible sites. The probability to observe the ball at a given site i is uniform, p(i)=1/N. (Right) The ball can only bounce downward; the left–right symmetry is broken. When level 1 is reached the process stops and is repeated. Sample space reduces from step to step in a nested way (main feature of SSR processes). After many iterations the occupation distribution (visits to level i) follows Zipf’s law, pN=10(i)i1. Symmetry breaking of the sampling changes the uniform probability distribution to a power law.
Fig. 3.
Fig. 3.
(A) Rank distributions of SSR processes with iid noise contributions from simulations of Φ(λ), for three values of λ = 1, 0.7, and 0.5 (black, red, and blue, respectively). Fits to the distributions [obtained with a maximum-likelihood estimator (24)] yield λfit = 0.999, λfit = 0.699, and λfit = 0.499, respectively. Clearly, an almost exact match with the expected power-law exponents is realized. The inset shows the dependence of the measured exponent λsim from the simulations (slope), on various noise levels λ. The exponent λsim is practically identical to λ. N = 10,000, numerical simulations were stopped after M = 106 restarts of the process. (B) Convergence rate. The distance (2-norm) between the simulated occupation probability (normalized histogram) after T jumps in the Φ(λ) process, and the predicted power-law of Eq. 6, is shown for λ = 1 (black), and the pure random case, λ = 0 (red). Both distances show a power-law convergence ∼T−β. MLE fits yield β = 0.512 and 0.463, for λ = 0 and 1, respectively. This means that both cases are compatible with β ∼ 1/2, and that SSR processes converge equally fast toward their limiting distributions as pure random walks do.
Fig. 4.
Fig. 4.
Empirical rank distribution of word frequencies in The Origin of Species (black), showing two power-law regimes. For the most frequent words, the distribution is approximately power-law with an exponent γ0.9. The corresponding distribution for the Φ(λ) process with λ=0.9 (red), suggests a slight deviation from perfect nesting. This means that in sentence formation, about 90% of consecutive word pairs, sample space is strictly reducing. Simulation: N=5,000 (words), and M=10,000 restarts (sentences).
Fig. 5.
Fig. 5.
(A) SSR processes seen as random walks on networks. A random walker starts at the start node and diffuses through the directed network. Depending on the value of pexit, two possible types of walks are possible. For pexit=1, the finite (2N1=16 possible paths) and acyclic process ϕ is recovered that stops after a single path; for pexit=0, we have the infinite and cyclical process, ϕ. For pexit>0 we have the mixed process, Φmix=pexitϕ+(1pexit)ϕ. (B) The occupation probability for Φmix is unaffected by the value of pexit. The repeated ϕ (dashed black line) and the mixed process with pexit=0.3 (solid red line) have exactly the same occupation probability pN(i), which corresponds to the stationary visiting distribution of nodes in the ϕ network by random walkers. (Inset) Rank distribution of path-visit frequencies. Clearly they depend strongly on pexit. Whereas the acyclic ϕ produces a finite distribution, the cyclic one produces a power law, matching the theoretical prediction of ref. . For the simulation we generated 5105 sequence samples and found 32,523 distinct sequences for pexit=0.3.

References

    1. Zipf GK. Human Behavior and the Principle of Least Effort. Addison-Wesley; Reading, MA: 1949.
    1. Stanley HE, et al. Scaling features of noncoding DNA. Physica A. 1999;273(1-2):1–18. - PubMed
    1. Thurner S, Szell M, Sinatra R. Emergence of good conduct, scaling and zipf laws in human behavioral sequences in an online world. PLoS ONE. 2012;7(1):e29796. - PMC - PubMed
    1. Gabaix X, Gopikrishnan P, Plerou V, Stanley HE. A theory of power-law distributions in financial market fluctuations. Nature. 2003;423(6937):267–270. - PubMed
    1. Price DJ. Networks of scientific papers. Science. 1965;149(3683):510–515. - PubMed

Publication types

LinkOut - more resources