Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 29;9(1):e85777.
doi: 10.1371/journal.pone.0085777. eCollection 2014.

Powerlaw: a Python package for analysis of heavy-tailed distributions

Affiliations

Powerlaw: a Python package for analysis of heavy-tailed distributions

Jeff Alstott et al. PLoS One. .

Erratum in

  • PLoS One. 2014;9(4):e95816

Abstract

Power laws are theoretically interesting probability distributions that are also frequently used to describe empirical data. In recent years, effective statistical methods for fitting power laws have been developed, but appropriate use of these techniques requires significant programming and statistical insight. In order to greatly decrease the barriers to using good statistical methods for fitting power law distributions, we developed the powerlaw Python package. This software package provides easy commands for basic fitting and statistical analysis of distributions. Notably, it also seeks to support a variety of user needs by being exhaustive in the options available to the user. The source code is publicly available and easily extensible.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Basic steps of analysis for heavy-tailed distributions: visualizing, fitting, and comparing.
Example data for power law fitting are a good fit (left column), medium fit (middle column) and poor fit (right column). Data and methods described in text. a) Visualizing data with probability density functions. A typical histogram on linear axes (insets) is not helpful for visualizing heavy-tailed distributions. On log-log axes, using logarithmically spaced bins is necessary to accurately represent data (blue line). Linearly spaced bins (red line) obscure the tail of the distribution (see text). b) Fitting to the tail of the distribution. The best fit power law may only cover a portion of the distribution's tail. Dotted green line: power law fit starting at formula image = 1. Dashed green line: power law fit starting from the optimal formula image (see Basic Methods: Identifying the Scaling Range). c) Comparing the goodness of fit. Once the best fit to a power law is established, comparison to other possible distributions is necessary. Dashed green line: power law fit starting from the optimal formula image. Dashed red line: exponential fit starting from the same formula image.
Figure 2
Figure 2. Probability density function (, blue) and complemenatary cumulative distribution function (, red) of word frequencies from “Moby Dick”.
formula image
Figure 3
Figure 3. Complemenatary cumulative distribution functions of the empirical word frequency data and fitted power law distribution, with and without an upper limit .
Figure 4
Figure 4. Complemenatary cumulative distribution functions of word frequency data and fitted power law and lognormal distributions.
Figure 5
Figure 5. Example of multiple local minima of Kolmogorov-Smirnov distance
formula image across formula image . As a power law is fitted to data starting from different formula image, the goodness of fit between the power law and the data is measured by the Kolmogorov-Smirnov distance formula image, with the best formula image minimizing this distance. Here fitted data is the population sizes affected by blackouts. While there exists a clear absolute minima for formula image at 230, and thus 230 is the optimal formula image additional restrictions could exclude this fit. Parameter requirements such as formula image or formula image would restrict the formula image values considered, leading to the identification of a different, smaller formula image at 50.

References

    1. Michel M, Kirk H, Myers PC (2011) Mass Distributions of Stars and Cores in Young Groups and Clusters. The Astrophysical Journal 735: 51.
    1. Zipf GK (1935) Psycho-Biology of Languages: An Introduction to Dynamic Philology. Boston: Houghton-Mifflin.
    1. Beggs JM, Plenz D (2003) Neuronal Avalanches in Neocortical Circuits. The Journal of Neuro-science 23: 11167–11177. - PMC - PubMed
    1. Shriki O, Alstott J, Carver F, Holroyd T, Henson R, et al. (2013) Neuronal Avalanches in the Resting MEG of the Human Brain. Journal of Neuroscience 33: 7079–7090. - PMC - PubMed
    1. Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Review 51.

Publication types