Understanding scaling through history-dependent processes with collapsing sample space

Bernat Corominas-Murtra¹, Rudolf Hanel¹, Stefan Thurner²

Affiliations

¹ Section for Science of Complex Systems, Medical University of Vienna, A-1090 Vienna, Austria;
² Section for Science of Complex Systems, Medical University of Vienna, A-1090 Vienna, Austria; Santa Fe Institute, Santa Fe, NM 87501; and International Institute for Applied Systems Analysis, A-2361 Laxenburg, Austria stefan.thurner@meduniwien.ac.at.

PMID: 25870294
PMCID: PMC4418853
DOI: 10.1073/pnas.1420946112

Understanding scaling through history-dependent processes with collapsing sample space

Bernat Corominas-Murtra et al. Proc Natl Acad Sci U S A. 2015.

. 2015 Apr 28;112(17):5348-53.

doi: 10.1073/pnas.1420946112. Epub 2015 Apr 13.

Authors

Bernat Corominas-Murtra¹, Rudolf Hanel¹, Stefan Thurner²

Affiliations

¹ Section for Science of Complex Systems, Medical University of Vienna, A-1090 Vienna, Austria;
² Section for Science of Complex Systems, Medical University of Vienna, A-1090 Vienna, Austria; Santa Fe Institute, Santa Fe, NM 87501; and International Institute for Applied Systems Analysis, A-2361 Laxenburg, Austria stefan.thurner@meduniwien.ac.at.

PMID: 25870294
PMCID: PMC4418853
DOI: 10.1073/pnas.1420946112

Abstract

History-dependent processes are ubiquitous in natural and social systems. Many such stochastic processes, especially those that are associated with complex systems, become more constrained as they unfold, meaning that their sample space, or their set of possible outcomes, reduces as they age. We demonstrate that these sample-space-reducing (SSR) processes necessarily lead to Zipf's law in the rank distributions of their outcomes. We show that by adding noise to SSR processes the corresponding rank distributions remain exact power laws, p(x) ~ x(-λ), where the exponent directly corresponds to the mixing ratio of the SSR process and noise. This allows us to give a precise meaning to the scaling exponent in terms of the degree to which a given process reduces its sample space as it unfolds. Noisy SSR processes further allow us to explain a wide range of scaling exponents in frequency distributions ranging from α = 2 to ∞. We discuss several applications showing how SSR processes can be used to understand Zipf's law in word frequencies, and how they are related to diffusion processes in directed networks, or aging processes such as in fragmentation processes. SSR processes provide a new alternative to understand the origin of scaling in complex systems without the recourse to multiplicative, preferential, or self-organized critical processes.

Keywords: Zipf’s law; network diffusion; path dependence; random walks; scaling laws.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
SSR process. Imagine a set of $N = 20$ dice with different numbers of faces. We start by throwing the 20-faced dice (icosahedron). Suppose we get a face value of 13. We now have to take the 12-faced dice (dodecahedron), throw it, and get a face value of, say, 9, so that we must continue with the 8-faced dice. Say we throw a 7, forcing us to take the (ordinary) dice, with which we throw, say, a 5. With the 4-faced dice we get a 2, which forces us to take the 2-faced dice (coin). The process ends when we throw a 1 for the first time. The set of possible outcomes (sample space) reduces as the process unfolds. The sequence above was chosen to make use of the platonic dice for pictorial reasons only. If the process is repeated many times, the distribution of face values (rank-ordered) gives Zipf’s law.

**Fig. 2.**
Illustration of path dependence, SSR, and nestedness of sample space. (*Left*) Unconstrained (iid) random walk $ϕ_{R}$ realized by a ball randomly bouncing between all possible sites. The probability to observe the ball at a given site i is uniform, $p (i) = 1 / N$ . (*Right*) The ball can only bounce downward; the left–right symmetry is broken. When level 1 is reached the process stops and is repeated. Sample space reduces from step to step in a nested way (main feature of SSR processes). After many iterations the occupation distribution (visits to level i) follows Zipf’s law, $p_{N = 10} (i) \propto i^{- 1}$ . Symmetry breaking of the sampling changes the uniform probability distribution to a power law.

**Fig. 3.**
(A) Rank distributions of SSR processes with iid noise contributions from simulations of Φ^(λ)_∞, for three values of λ = 1, 0.7, and 0.5 (black, red, and blue, respectively). Fits to the distributions [obtained with a maximum-likelihood estimator (24)] yield λ^fit = 0.999, λ^fit = 0.699, and λ^fit = 0.499, respectively. Clearly, an almost exact match with the expected power-law exponents is realized. The inset shows the dependence of the measured exponent λ^sim from the simulations (slope), on various noise levels λ. The exponent λ^sim is practically identical to λ. N = 10,000, numerical simulations were stopped after M = 10⁶ restarts of the process. (B) Convergence rate. The distance (2-norm) between the simulated occupation probability (normalized histogram) after T jumps in the Φ^(λ)_∞ process, and the predicted power-law of Eq. 6, is shown for λ = 1 (black), and the pure random case, λ = 0 (red). Both distances show a power-law convergence ∼T^−β. MLE fits yield β = 0.512 and 0.463, for λ = 0 and 1, respectively. This means that both cases are compatible with β ∼ 1/2, and that SSR processes converge equally fast toward their limiting distributions as pure random walks do.

**Fig. 4.**
Empirical rank distribution of word frequencies in *The Origin of Species* (black), showing two power-law regimes. For the most frequent words, the distribution is approximately power-law with an exponent $γ \sim 0.9$ . The corresponding distribution for the $Φ^{(λ)}$ process with $λ = 0.9$ (red), suggests a slight deviation from perfect nesting. This means that in sentence formation, about 90% of consecutive word pairs, sample space is strictly reducing. Simulation: $N = 5,000$ (words), and $M = 10,000$ restarts (sentences).

**Fig. 5.**
(A) SSR processes seen as random walks on networks. A random walker starts at the start node and diffuses through the directed network. Depending on the value of $p_{exit}$ , two possible types of walks are possible. For $p_{exit} = 1$ , the finite ( $2^{N - 1} = 16$ possible paths) and acyclic process ϕ is recovered that stops after a single path; for $p_{exit} = 0$ , we have the infinite and cyclical process, $ϕ^{\infty}$ . For $p_{exit} > 0$ we have the mixed process, $Φ^{mix} = p_{exit} ϕ + (1 - p_{exit}) ϕ^{\infty}$ . (B) The occupation probability for $Φ^{mix}$ is unaffected by the value of $p_{exit}$ . The repeated ϕ (dashed black line) and the mixed process with $p_{exit} = 0.3$ (solid red line) have exactly the same occupation probability $p_{N} (i)$ , which corresponds to the stationary visiting distribution of nodes in the $ϕ^{\infty}$ network by random walkers. (*Inset*) Rank distribution of path-visit frequencies. Clearly they depend strongly on $p_{exit}$ . Whereas the acyclic ϕ produces a finite distribution, the cyclic one produces a power law, matching the theoretical prediction of ref. . For the simulation we generated $5 \cdot 10^{5}$ sequence samples and found $32,523$ distinct sequences for $p_{exit} = 0.3$ .

See this image and copyright information in PMC

References

1. Zipf GK. Human Behavior and the Principle of Least Effort. Addison-Wesley; Reading, MA: 1949.
1. Stanley HE, et al. Scaling features of noncoding DNA. Physica A. 1999;273(1-2):1–18. - PubMed
1. Thurner S, Szell M, Sinatra R. Emergence of good conduct, scaling and zipf laws in human behavioral sequences in an online world. PLoS ONE. 2012;7(1):e29796. - PMC - PubMed
1. Gabaix X, Gopikrishnan P, Plerou V, Stanley HE. A theory of power-law distributions in financial market fluctuations. Nature. 2003;423(6937):267–270. - PubMed
1. Price DJ. Networks of scientific papers. Science. 1965;149(3683):510–515. - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Understanding scaling through history-dependent processes with collapsing sample space

Affiliations

Understanding scaling through history-dependent processes with collapsing sample space

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources