. 2020 Mar 1;69(2):280-293.

doi: 10.1093/sysbio/syz047.

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies

Chris Whidden¹, Brian C Claywell¹, Thayer Fisher², Andrew F Magee³, Mathieu Fourment⁴, Frederick A Matsen¹

Affiliations

¹ Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
² Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
³ Department of Biology, University of Washington, Seattle, WA 98195, USA.
⁴ ithree institute, University of Technology Sydney, Sydney, Australia.

PMID: 31504997
PMCID: PMC7682726
DOI: 10.1093/sysbio/syz047

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies

Chris Whidden et al. Syst Biol. 2020.

. 2020 Mar 1;69(2):280-293.

doi: 10.1093/sysbio/syz047.

Authors

Chris Whidden¹, Brian C Claywell¹, Thayer Fisher², Andrew F Magee³, Mathieu Fourment⁴, Frederick A Matsen¹

Affiliations

¹ Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
² Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
³ Department of Biology, University of Washington, Seattle, WA 98195, USA.
⁴ ithree institute, University of Technology Sydney, Sydney, Australia.

PMID: 31504997
PMCID: PMC7682726
DOI: 10.1093/sysbio/syz047

Abstract

Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, "likelihood" of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method "phylogenetic topographer" (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies.

Keywords: Bayesian phylogenetics; consensus trees; phylogenetic islands; phylogenetic tree topology; systematic search.

PubMed Disclaimer

Figures

**Figure 1.**
A high level overview of PT. Starting with a set of high likelihood topologies (filled black circles), PT tests their NNI neighbors. Given a negative threshold , the search explores all topologies above the threshold (hollow solid circles) while topologies below the threshold are rejected (hollow dashed circles).

formula image — **Figure 1.**
A high level overview of PT. Starting with a set of high likelihood topologies (filled black circles), PT tests their NNI neighbors. Given a negative threshold , the search explores all topologies above the threshold (hollow solid circles) while topologies below the threshold are rejected (hollow dashed circles).

**Figure 2.**
a) The cumulative posterior probability of HLSs explored by PT at different log-likelihood thresholds. b) The covered PP (from MrBayes) with increasing time on a subset of data sets.

**Figure 3.**
a) The mean running time improvement from multithreading on DS1 with 200 starting points and a threshold of 10 log-likelihood units. b) The running time improvement from optimizing branch lengths on DS1 with a limited radius of 0 (solid), 1 (dashed), 2 (dotted), and full branch optimization (dotdashed).

**Figure 4.**
SPR graphs built from MrBayes credible sets of DS1, DS4, and DS6 and colored according to PT exploration. Topologies found by PT with successively more negative log-likelihood thresholds are shown on a dark to light blue scale. Scales vary per data set with respect to the most negative computed threshold. Yellow means not found. See Materials and Methods for more explanation of these graphs.

**Figure 5.**
SPR graphs of the credible sets of the remaining data sets with 200 starting points. Topologies found by PT with successively more negative log-likelihood threshold are shown on a dark to light blue scale. Yellow means not found. See Materials and Methods for more explanation of these graphs.

**Figure 6.**
The nonsingleton covered PP (from MrBayes) with increasing time. Default (solid), MAP (dotted), and model testing (dashed) runs were evaluated with 200 starting points.

**Figure 7.**
Individual topology LL versus MrBayes PP for credible set topologies using 200 starting points. Runs are taken at the most negative log-likelihood threshold completed for standard, MAP, and model testing runs.

**Figure 8.**
Comparison of RMSD of split frequencies from aggregated golden runs over time for MrBayes (dashed), PT (solid), and PT assuming MrBayes posterior probabilities (dotted). PT results used 200 starting points. a) DS1, b) DS4, and c) DS6.

**Figure 9.**
Comparison of split frequencies from aggregated golden runs over time for MrBayes and PT. a) DS1, b) DS4, and c) DS6.

**Figure 10.**
A tanglegram showing differences between MrBayes (left) and PT (right) Consensus topologies for DS1.

See this image and copyright information in PMC

Cited by

19 Dubious Ways to Compute the Marginal Likelihood of a Phylogenetic Tree Topology.
Fourment M, Magee AF, Whidden C, Bilge A, Matsen FA, Minin VN. Fourment M, et al. Syst Biol. 2020 Mar 1;69(2):209-220. doi: 10.1093/sysbio/syz046. Syst Biol. 2020. PMID: 31504998 Free PMC article.
Fidelity of hyperbolic space for Bayesian phylogenetic inference.
Macaulay M, Darling A, Fourment M. Macaulay M, et al. PLoS Comput Biol. 2023 Apr 26;19(4):e1011084. doi: 10.1371/journal.pcbi.1011084. eCollection 2023 Apr. PLoS Comput Biol. 2023. PMID: 37099595 Free PMC article.
How Trustworthy Is Your Tree? Bayesian Phylogenetic Effective Sample Size Through the Lens of Monte Carlo Error.
Magee A, Karcher M, Matsen FA 4th, Minin VM. Magee A, et al. Bayesian Anal. 2024 Jun;19(2):565-593. doi: 10.1214/22-ba1339. Epub 2024 Apr 9. Bayesian Anal. 2024. PMID: 38665694 Free PMC article.
Testing Phylogenetic Stability with Variable Taxon Sampling.
Powell CLE, Battistuzzi FU. Powell CLE, et al. Methods Mol Biol. 2022;2569:167-188. doi: 10.1007/978-1-0716-2691-7_8. Methods Mol Biol. 2022. PMID: 36083448
Finding high posterior density phylogenies by systematically extending a directed acyclic graph.
Jennings-Shaffer C, Rich DH, Macaulay M, Karcher MD, Ganapathy T, Kiami S, Kooperberg A, Zhang C, Suchard MA, Matsen FA 4th. Jennings-Shaffer C, et al. Algorithms Mol Biol. 2025 Feb 28;20(1):2. doi: 10.1186/s13015-025-00273-x. Algorithms Mol Biol. 2025. PMID: 40022201 Free PMC article.

See all "Cited by" articles

References

1. Blei D.M., Kucukelbir A., McAuliffe J.D.. 2017. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112:859–877.
1. Dinh V., Bilge A., Zhang C., Matsen F.A. IV.. 2017. Probabilistic path Hamiltonian Monte Carlo. International Conference on Machine Learning. 1009–1018.
1. Dobra A., Massam H.. 2010. The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors. Stat. Methodol. 7:240–253.
1. Flouri T., Darriba D., Kozlov A., Holder M.T., Matsen F.A. IV, Barbera P.. 2018. libpll. Available from: https://github.com/xflouris/libpll.
1. Fourment M., Magee A.F., Whidden C., Bilge A., Matsen F.A. IV, Minin V.N.. 2018. 19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology. Syst. Biol. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies

Affiliations

Systematic Exploration of the High Likelihood Set of Phylogenetic Tree Topologies

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous