Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 20;115(8):1760-1765.
doi: 10.1073/pnas.1715306115. Epub 2018 Feb 6.

Maximizing the information learned from finite data selects a simple model

Affiliations

Maximizing the information learned from finite data selects a simple model

Henry H Mattingly et al. Proc Natl Acad Sci U S A. .

Abstract

We use the language of uninformative Bayesian prior choice to study the selection of appropriately simple effective models. We advocate for the prior which maximizes the mutual information between parameters and predictions, learning as much as possible from limited data. When many parameters are poorly constrained by the available data, we find that this prior puts weight only on boundaries of the parameter space. Thus, it selects a lower-dimensional effective theory in a principled way, ignoring irrelevant parameter directions. In the limit where there are sufficient data to tightly constrain any number of parameters, this reduces to the Jeffreys prior. However, we argue that this limit is pathological when applied to the hyperribbon parameter manifolds generic in science, because it leads to dramatic dependence on effects invisible to experiment.

Keywords: Bayesian prior choice; effective theory; information theory; model selection; renormalization group.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Optimal priors for the Bernoulli model (Eq. 1). Red lines indicate the positions of delta functions in p(θ), which are at the maxima of fKL(θ), Eq. 3. As m these coalesce into the Jeffreys prior pJ(θ).
Fig. 2.
Fig. 2.
Convergence of the BA algorithm. This is for the one-parameter Gaussian model Eq. 2 with L=10 (comparable to m=10 in Fig. 1). Right shows θ discretized into 10 times as many points, but pτ(θ) clearly converges to the same 5 delta functions.
Fig. 3.
Fig. 3.
Behavior of p(θ) with increasing Fisher length. A and B show the atoms of p(θ) for the two one-dimensional models as L is increased (i.e., we perform more repetitions m or have smaller noise σ). C shows the scaling of the MI (in bits) with the number of atoms K. The dashed line is the bound MIlogK, and the solid line is the scaling law MI3/4logK.
Fig. 4.
Fig. 4.
Parameters and priors for the exponential model (Eq. 5). A shows the area of the y plane covered by all decay constants k1,k20. B shows the positions of the delta functions of the optimal prior p(y) for several values of σ, with colors indicating the dimensionality r at each point. C shows the proportion of weight on these dimensionalities.
Fig. 5.
Fig. 5.
Distributions of expected data p(x) from different priors. A is the one-parameter Gaussian model, with L=10. B projects the two-parameter exponential model onto the y1+y2 direction, for σ=1/7 where the perpendicular direction should be irrelevant. The length of the relevant direction is about the same as the one-parameter case: L+=72. Note that the distribution of expected data p(x+) from the Jeffreys prior here is quite different, with almost no weight at the ends of the range (0 and 2), because this prior still weights the area and not the length.

References

    1. Kadanoff LP. Scaling laws for Ising models near Tc. Physics. 1966;2:263–272.
    1. Wilson KG. Renormalization group and critical phenomena. 1. Renormalization group and the Kadanoff scaling picture. Phys Rev. 1971;B4:3174–3183.
    1. Cardy JL. Scaling and Renormalization in Statistical Physics. Cambridge Univ Press; Cambridge, UK: 1996.
    1. Waterfall JJ, et al. Sloppy-model universality class and the Vandermonde matrix. Phys Rev Lett. 2006;97:150601–150604. - PubMed
    1. Gutenkunst RN, et al. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput Biol. 2007;3:1871–1878. - PMC - PubMed

Publication types

LinkOut - more resources