Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 13;38(4):1627-1640.
doi: 10.1093/molbev/msaa295.

Properties of Markov Chain Monte Carlo Performance across Many Empirical Alignments

Affiliations

Properties of Markov Chain Monte Carlo Performance across Many Empirical Alignments

Sean M Harrington et al. Mol Biol Evol. .

Abstract

Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.

Keywords: I + G; I + Γ; MC3; MCMC; MrBayes; mixing; parallel tempering; substitution model averaging.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Density plot showing the frequency of ESS values for each parameter. Median values are denoted by solid vertical lines. The left tail extends to near zero but has been truncated for space.
Fig. 2.
Fig. 2.
Density plots showing the frequency of acceptance rates for each move. Median values are denoted by solid vertical lines.
Fig. 3.
Fig. 3.
Venn diagram showing the number of chains that fail either one or both convergence diagnostics indicated under each circle. Part (A) shows chains that fail topological ESS, LnL ESS, or both. Part (B) shows chains that fail ASDSF, topological ESS, or both. Part (C) shows chains that fail ASDSF, PRSF, or both. Part (D) shows chains that fail ASDSF, any non-topological ESS, or both.
Fig. 4.
Fig. 4.
PCA of core convergence diagnostics showing loadings of variables. Loadings have been rescaled to be visible on the same scale as the PCs.
Fig. 5.
Fig. 5.
PCA of acceptance rates showing loadings of variables. Loadings have been rescaled to be visible on the same scale as the PCs.
Fig. 6.
Fig. 6.
Correlation matrix showing the mean of Pearson correlation coefficients for values of pairs of parameters across individual chains.

References

    1. Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F.. 2004. Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20(3):407–415. - PubMed
    1. Barley AJ, Thomson RC.. 2016. Assessing the performance of DNA barcoding using posterior predictive simulations. Mol Ecol. 25(9):1944–1957. - PubMed
    1. Bilderbeek RJC, Etienne RS.. 2018. babette: BEAUti 2, BEAST2 and Tracer for R. Methods Ecol Evol. 9(9):2034–2040.
    1. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, Suchard MA, Rambaut A, Drummond AJ.. 2014. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol. 10(4):e1003537. - PMC - PubMed
    1. Brooks SP, Roberts GO.. 1998. Convergence assessment techniques for Markov chain Monte Carlo. Stat Comput. 8(4):319–335.

Publication types