. 2018 Feb 20;115(8):1760-1765.

doi: 10.1073/pnas.1715306115. Epub 2018 Feb 6.

Maximizing the information learned from finite data selects a simple model

Henry H Mattingly^{1

2}, Mark K Transtrum³, Michael C Abbott⁴, Benjamin B Machta^{5

6}

Affiliations

¹ Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544.
² Lewis-Sigler Institute, Princeton University, Princeton, NJ 08544.
³ Department of Physics and Astronomy, Brigham Young University, Provo, UT 84602.
⁴ Marian Smoluchowski Institute of Physics, Jagiellonian University, 30-348 Kraków, Poland; abbott@th.if.uj.edu.pl benjamin.machta@yale.edu.
⁵ Lewis-Sigler Institute, Princeton University, Princeton, NJ 08544; abbott@th.if.uj.edu.pl benjamin.machta@yale.edu.
⁶ Department of Physics, Princeton University, Princeton, NJ 08544.

PMID: 29434042
PMCID: PMC5828598
DOI: 10.1073/pnas.1715306115

Maximizing the information learned from finite data selects a simple model

Henry H Mattingly et al. Proc Natl Acad Sci U S A. 2018.

. 2018 Feb 20;115(8):1760-1765.

doi: 10.1073/pnas.1715306115. Epub 2018 Feb 6.

Authors

Henry H Mattingly^{1

2}, Mark K Transtrum³, Michael C Abbott⁴, Benjamin B Machta^{5

6}

Affiliations

¹ Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544.
² Lewis-Sigler Institute, Princeton University, Princeton, NJ 08544.
³ Department of Physics and Astronomy, Brigham Young University, Provo, UT 84602.
⁴ Marian Smoluchowski Institute of Physics, Jagiellonian University, 30-348 Kraków, Poland; abbott@th.if.uj.edu.pl benjamin.machta@yale.edu.
⁵ Lewis-Sigler Institute, Princeton University, Princeton, NJ 08544; abbott@th.if.uj.edu.pl benjamin.machta@yale.edu.
⁶ Department of Physics, Princeton University, Princeton, NJ 08544.

PMID: 29434042
PMCID: PMC5828598
DOI: 10.1073/pnas.1715306115

Abstract

We use the language of uninformative Bayesian prior choice to study the selection of appropriately simple effective models. We advocate for the prior which maximizes the mutual information between parameters and predictions, learning as much as possible from limited data. When many parameters are poorly constrained by the available data, we find that this prior puts weight only on boundaries of the parameter space. Thus, it selects a lower-dimensional effective theory in a principled way, ignoring irrelevant parameter directions. In the limit where there are sufficient data to tightly constrain any number of parameters, this reduces to the Jeffreys prior. However, we argue that this limit is pathological when applied to the hyperribbon parameter manifolds generic in science, because it leads to dramatic dependence on effects invisible to experiment.

Keywords: Bayesian prior choice; effective theory; information theory; model selection; renormalization group.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
Optimal priors for the Bernoulli model (Eq. 1). Red lines indicate the positions of delta functions in $p_{⋆} (θ)$ , which are at the maxima of $f_{KL} (θ)$ , Eq. 3. As $m \to \infty$ these coalesce into the Jeffreys prior $p_{J} (θ)$ .

**Fig. 2.**
Convergence of the BA algorithm. This is for the one-parameter Gaussian model Eq. 2 with $L = 10$ (comparable to $m = 10$ in Fig. 1). *Right* shows $θ$ discretized into 10 times as many points, but $p_{τ} (θ)$ clearly converges to the same 5 delta functions.

**Fig. 3.**
Behavior of $p_{⋆} (θ)$ with increasing Fisher length. A and B show the atoms of $p_{⋆} (θ)$ for the two one-dimensional models as $L$ is increased (i.e., we perform more repetitions $m$ or have smaller noise $σ$ ). C shows the scaling of the MI (in bits) with the number of atoms $K$ . The dashed line is the bound $MI \leq \log K$ , and the solid line is the scaling law $MI \sim 3 / 4 \log K$ .

**Fig. 4.**
Parameters and priors for the exponential model (Eq. 5). A shows the area of the $\vec{y}$ plane covered by all decay constants $k_{1}, k_{2} \geq 0$ . B shows the positions of the delta functions of the optimal prior $p_{⋆} (\vec{y})$ for several values of $σ$ , with colors indicating the dimensionality $r$ at each point. C shows the proportion of weight on these dimensionalities.

**Fig. 5.**
Distributions of expected data $p (x)$ from different priors. A is the one-parameter Gaussian model, with $L = 10$ . B projects the two-parameter exponential model onto the $y_{1} + y_{2}$ direction, for $σ = 1 / 7$ where the perpendicular direction should be irrelevant. The length of the relevant direction is about the same as the one-parameter case: $L_{+} = 7 \sqrt{2}$ . Note that the distribution of expected data $p (x_{+})$ from the Jeffreys prior here is quite different, with almost no weight at the ends of the range ( $0$ and $\sqrt{2}$ ), because this prior still weights the area and not the length.

See this image and copyright information in PMC

References

1. Kadanoff LP. Scaling laws for Ising models near $T_{c}$ . Physics. 1966;2:263–272.
1. Wilson KG. Renormalization group and critical phenomena. 1. Renormalization group and the Kadanoff scaling picture. Phys Rev. 1971;B4:3174–3183.
1. Cardy JL. Scaling and Renormalization in Statistical Physics. Cambridge Univ Press; Cambridge, UK: 1996.
1. Waterfall JJ, et al. Sloppy-model universality class and the Vandermonde matrix. Phys Rev Lett. 2006;97:150601–150604. - PubMed
1. Gutenkunst RN, et al. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput Biol. 2007;3:1871–1878. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions

Grants and funding

R01 GM107103/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Maximizing the information learned from finite data selects a simple model

Affiliations

Maximizing the information learned from finite data selects a simple model

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources