Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 1;69(2):280-293.
doi: 10.1093/sysbio/syz047.

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies

Affiliations

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies

Chris Whidden et al. Syst Biol. .

Abstract

Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, "likelihood" of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method "phylogenetic topographer" (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies.

Keywords: Bayesian phylogenetics; consensus trees; phylogenetic islands; phylogenetic tree topology; systematic search.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A high level overview of PT. Starting with a set of high likelihood topologies (filled black circles), PT tests their NNI neighbors. Given a negative threshold formula image, the search explores all topologies above the threshold formula image (hollow solid circles) while topologies below the threshold are rejected (hollow dashed circles).
Figure 2.
Figure 2.
a) The cumulative posterior probability of HLSs explored by PT at different log-likelihood thresholds. b) The covered PP (from MrBayes) with increasing time on a subset of data sets.
Figure 3.
Figure 3.
a) The mean running time improvement from multithreading on DS1 with 200 starting points and a threshold of 10 log-likelihood units. b) The running time improvement from optimizing branch lengths on DS1 with a limited radius of 0 (solid), 1 (dashed), 2 (dotted), and full branch optimization (dotdashed).
Figure 4.
Figure 4.
SPR graphs built from MrBayes credible sets of DS1, DS4, and DS6 and colored according to PT exploration. Topologies found by PT with successively more negative log-likelihood thresholds are shown on a dark to light blue scale. Scales vary per data set with respect to the most negative computed threshold. Yellow means not found. See Materials and Methods for more explanation of these graphs.
Figure 5.
Figure 5.
SPR graphs of the credible sets of the remaining data sets with 200 starting points. Topologies found by PT with successively more negative log-likelihood threshold are shown on a dark to light blue scale. Yellow means not found. See Materials and Methods for more explanation of these graphs.
Figure 6.
Figure 6.
The nonsingleton covered PP (from MrBayes) with increasing time. Default (solid), MAP (dotted), and model testing (dashed) runs were evaluated with 200 starting points.
Figure 7.
Figure 7.
Individual topology LL versus MrBayes PP for credible set topologies using 200 starting points. Runs are taken at the most negative log-likelihood threshold completed for standard, MAP, and model testing runs.
Figure 8.
Figure 8.
Comparison of RMSD of split frequencies from aggregated golden runs over time for MrBayes (dashed), PT (solid), and PT assuming MrBayes posterior probabilities (dotted). PT results used 200 starting points. a) DS1, b) DS4, and c) DS6.
Figure 9.
Figure 9.
Comparison of split frequencies from aggregated golden runs over time for MrBayes and PT. a) DS1, b) DS4, and c) DS6.
Figure 10.
Figure 10.
A tanglegram showing differences between MrBayes (left) and PT (right) Consensus topologies for DS1.

Similar articles

Cited by

References

    1. Blei D.M., Kucukelbir A., McAuliffe J.D.. 2017. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112:859–877.
    1. Dinh V., Bilge A., Zhang C., Matsen F.A. IV.. 2017. Probabilistic path Hamiltonian Monte Carlo. International Conference on Machine Learning. 1009–1018.
    1. Dobra A., Massam H.. 2010. The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors. Stat. Methodol. 7:240–253.
    1. Flouri T., Darriba D., Kozlov A., Holder M.T., Matsen F.A. IV, Barbera P.. 2018. libpll. Available from: https://github.com/xflouris/libpll.
    1. Fourment M., Magee A.F., Whidden C., Bilge A., Matsen F.A. IV, Minin V.N.. 2018. 19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology. Syst. Biol. - PMC - PubMed

Publication types