Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Sep 1;68(5):681-697.
doi: 10.1093/sysbio/syz003.

Marginal Likelihoods in Phylogenetics: A Review of Methods and Applications

Affiliations
Review

Marginal Likelihoods in Phylogenetics: A Review of Methods and Applications

Jamie R Oaks et al. Syst Biol. .

Abstract

By providing a framework of accounting for the shared ancestry inherent to all life, phylogenetics is becoming the statistical foundation of biology. The importance of model choice continues to grow as phylogenetic models continue to increase in complexity to better capture micro- and macroevolutionary processes. In a Bayesian framework, the marginal likelihood is how data update our prior beliefs about models, which gives us an intuitive measure of comparing model fit that is grounded in probability theory. Given the rapid increase in the number and complexity of phylogenetic models, methods for approximating marginal likelihoods are increasingly important. Here, we try to provide an intuitive description of marginal likelihoods and why they are important in Bayesian model testing. We also categorize and review methods for estimating marginal likelihoods of phylogenetic models, highlighting several recent methods that provide well-behaved estimates. Furthermore, we review some empirical studies that demonstrate how marginal likelihoods can be used to learn about models of evolution from biological data. We discuss promising alternatives that can complement marginal likelihoods for Bayesian model choice, including posterior-predictive methods. Using simulations, we find one alternative method based on approximate-Bayesian computation to be biased. We conclude by discussing the challenges of Bayesian model choice and future directions that promise to improve the approximation of marginal likelihoods and Bayesian phylogenetics as a whole.

Keywords: Marginal likelihood; model choice; phylogenetics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An illustration of the posterior probability densities and marginal likelihoods of the four different prior assumptions we made in our coin-flipping experiment. The data are 50 “heads” out of 100 coin flips, and the parameter, formula image, is the probability of the coin landing heads side up. The binomial likelihood density function is proportional to a formula image and is the same across the four different beta priors on formula image (formula imageformula image). The posterior of each model is a formula image distribution. The marginal likelihoods (formula image; the average of the likelihood density curve weighted by the prior) of the four models are compared.
Figure 2.
Figure 2.
A comparison of the approximate-likelihood Bayesian computation general linear model (ABC-GLM) estimator of the marginal likelihood (Leuenberger and Wegmann 2010) to quadrature integration approximations (Xie et al. 2011) for 100 simulated data sets. We compared the ratio of the marginal likelihood (Bayes factor) comparing the correct branch-length model [branch length formula image uniform(0.0001, 0.1)] to a model with a broader prior on the branch length [branch length formula image uniform(0.0001, 0.2)]. The solid line represents perfect performance of the ABC-GLM estimator (i.e., matching the “true” value of the Bayes factor). The dashed line represents the expected Bayes factor when failing to penalize for the extra parameter space (branch length 0.1 to 0.2) with essentially zero likelihood. Quadrature integration with 1000 and 10,000 steps using the rectangular and trapezoidal rule produced identical values of log marginal likelihoods to at least five decimal places for all 100 simulated data sets.
Figure A.1.
Figure A.1.
A comparison of the true branch length separating each pair of simulated sequences to the branch length estimated by ABC-GLM and full-likelihood MCMC under the correct branch-length model (branch length formula image uniform(0.0001, 0.1)) and the vague model (branch length formula image uniform(0.0001, 0.1)).

Similar articles

Cited by

References

    1. Akaike H. 1974. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19:716–723.
    1. Arima S., Tardella L.. 2012. Improved harmonic mean estimator for phylogenetic model evidence. J. Comput. Biol. 19:418–438. - PubMed
    1. Arima S., Tardella L.. 2014. Inflated density ratio (IDR) method for estimating marginal likelihoods in Bayesian phylogenetics. In: Chen M.-H., Kuo L., Lewis P.O., editors. Bayesian phylogenetics: methods, algorithms, and applications, Chapter 3 Boca Raton (FL): CRC Press; p. 25–57.
    1. Baele G., Lemey P.. 2013. Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency. Bioinformatics. 29:1970–1979. - PubMed
    1. Baele G., Lemey P.. 2014. Bayesian model selection in phylogenetics and genealogy-based population genetics. In: Chen M.-H., Kuo L., Lewis P.O., editors. Bayesian phylogenetics: methods, algorithms, and applications, Chapter 4. Boca Raton (FL): CRC Press; p. 59–93.

Publication types