Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 26;19(4):e1011084.
doi: 10.1371/journal.pcbi.1011084. eCollection 2023 Apr.

Fidelity of hyperbolic space for Bayesian phylogenetic inference

Affiliations

Fidelity of hyperbolic space for Bayesian phylogenetic inference

Matthew Macaulay et al. PLoS Comput Biol. .

Abstract

Bayesian inference for phylogenetics is a gold standard for computing distributions of phylogenies. However, Bayesian phylogenetics faces the challenging computational problem of moving throughout the high-dimensional space of trees. Fortunately, hyperbolic space offers a low dimensional representation of tree-like data. In this paper, we embed genomic sequences as points in hyperbolic space and perform hyperbolic Markov Chain Monte Carlo for Bayesian inference in this space. The posterior probability of an embedding is computed by decoding a neighbour-joining tree from the embedding locations of the sequences. We empirically demonstrate the fidelity of this method on eight data sets. We systematically investigated the effect of embedding dimension and hyperbolic curvature on the performance in these data sets. The sampled posterior distribution recovers the splits and branch lengths to a high degree over a range of curvatures and dimensions. We systematically investigated the effects of the embedding space's curvature and dimension on the Markov Chain's performance, demonstrating the suitability of hyperbolic space for phylogenetic inference.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Hyperbolic embedding of eight taxa.
Eight points (taxa) lie on a two-dimensional hyperboloid sheet H2 in R3. Arrows indicate how each two-dimensional point (x1, x2) is deterministically projected up onto the sheet by ϕ. The green dashed line illustrates that the distance between two taxa is the length of the geodesic on the hyperboloid surface.
Fig 2
Fig 2. Comparison between MrBayes and Dodonaphy’s MCMC starting from the evolutionary distances.
Comparison of (a) posterior probability trace, (b) split frequencies with 95% confidence intervals (CI), (red dots: MrBayes value not covered by CI), (c) mean branch lengths (leaf edges in blue circles, internal edges in red diamonds), (d) total tree length estimation of 10 repeats. Markers in (c) are shaded by the frequency of appearance in the golden run. In (a, d) black lines show the two MrBayes runs and thin blue lines show Dodonaphy’s results.
Fig 3
Fig 3. Effect of embedding curvature on the posterior distribution.
Comparison to the true posterior of: a) ASDSF, b) relative difference in median tree length, c) relative difference in the variance of tree length. The truncated variance ratio for DS5 is approximately zero. The right side corresponds to flatter (and Euclidean) curvature (κ = 0 gives log10(−κ) = −∞) and the left side is more curved.
Fig 4
Fig 4. Effect of embedding dimension on the posterior distribution.
Comparison to the true posterior of: a, d) ASDSF, b, e) relative difference in median tree length, c, f) relative difference in the variance of tree length. Top row curvature κ = −1, bottom row κ = −100. Truncated variance ratio in c) for DS3 with d = 2 is 16.41 and 39.05 for DS7.
Fig 5
Fig 5. Heat map of the decoded tree’s properties.
Obtained by a grid search in H2, moving one node (green diamond) through embedding space. a, d) joint probability p(T, Y), b, e) symmetric difference from best tree topology and c, f) total length. Top row curvature κ = −1 and bottom row κ = −1000. Black dots are the fixed locations of the remaining nodes.
Fig 6
Fig 6. SPR distances of samples about an embedded tree.
SPR distance of trees sampled from a Multivariate Gaussian (covariance Σ × I) about a tree containing 100 leaves embedded in H3 with curvature κ = −1. Distributions are estimated from 2000 samples.

Similar articles

Cited by

References

    1. Yang Z, Rannala B. Bayesian Phylogenetic Inference Using DNA Sequences: A Markov Chain Monte Carlo Method. Molecular Biology and Evolution. 1997;14(7):717–724. doi: 10.1093/oxfordjournals.molbev.a025811 - DOI - PubMed
    1. Larget B, Simon DL. Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees. Molecular Biology and Evolution. 1999;16:11. doi: 10.1093/oxfordjournals.molbev.a026160 - DOI
    1. Whidden C, Matsen FA. Quantifying MCMC Exploration of Phylogenetic Tree Space. Systematic Biology. 2015;64(3):472–491. doi: 10.1093/sysbio/syv006 - DOI - PMC - PubMed
    1. Harrington SM, Wishingrad V, Thomson RC. Properties of Markov Chain Monte Carlo Performance across Many Empirical Alignments. Molecular Biology and Evolution. 2021;38(4):1627–1640. doi: 10.1093/molbev/msaa295 - DOI - PMC - PubMed
    1. Höhna S, Drummond AJ. Guided Tree Topology Proposals for Bayesian Phylogenetic Inference. Systematic Biology. 2012;61(1):1–11. doi: 10.1093/sysbio/syr074 - DOI - PubMed

Publication types