Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2024 May 27;73(1):235-246.
doi: 10.1093/sysbio/syad075.

The Limits of the Constant-rate Birth-Death Prior for Phylogenetic Tree Topology Inference

Affiliations
Meta-Analysis

The Limits of the Constant-rate Birth-Death Prior for Phylogenetic Tree Topology Inference

Mark P Khurana et al. Syst Biol. .

Abstract

Birth-death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth-death (crBD) model tend to differ from empirical trees, for example, with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.

Keywords: Birth–death model; phylogenetic timescale inference; tree imbalance; tree shape.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Distribution of crBD model parameter values inferred from empirical TimeTree trees (n = 1189 trees) with ρ = 1. (a) Birth parameters, log(λ). (b) Death parameters, log(μ). 778 (65.4%) of μ values were zero. Green dotted line represents median value, excluding μ values = 0.
Figure 2.
Figure 2.
(a) Differences between the realized Wilcoxon (Mann–Whitney) U statistic value and the expected U value, including corresponding P-values, for each index (n = 1189 trees). Dotted line represents a Bonferroni-corrected P-value. (b) Z-score distributions for median I, the normalized Colless index, leaf depth variance, B2, and stairs2. Green dotted line represents mean value, gray dotted line denotes zero.
Figure 3.
Figure 3.
(a) Linear regression coefficients and corresponding 95% confidence intervals for the relationship between inference method (including tree prior) and various topology indices (Imbalance indices: median I, normalized Colless index, leaf depth variance; Balance indices: B2 index, stairs2), controlling for the number of tree tips (not included in the figure). Non-Bayesian studies were used as the reference group. There was no evidence that tree topologies were significantly influenced by the inference method for the 299 included empirical studies. (b) Scatter and density plots for the values of the two most discriminatory indices, B2 and leaf depth variance, colored by inference method.
Figure 4.
Figure 4.
Differences from simulations to imbalanced empirical trees, with simulations of molecular alignments made under 3 substitution rates and with tree inferences from IQ-TREE 2 and RevBayes. Whiskers denote error bars. Imbalance indices: (a) median I, (b) normalized Colless index, (c) leaf depth variance; Balance indices: (d) B2 index, (e) stairs2; Distance index: (f) Topological accuracy (1—Normalized Robinson-Foulds Distance).
Figure 5.
Figure 5.
Density plots for each index (Imbalance indices: normalized Colless index, median I, leaf depth variance; Balance indices: B2 index, stairs2) with rows showing the simulation substitution rates (rate = 0.5, 0.05, 0.005). The prior group values are derived from simulated trees, where the best crBD parameters were inferred from the 100 imbalanced empirical trees. One thousand crBD model trees were then simulated for each set of crBD parameters (where ρ = 1), whereafter index values were calculated. The prior distribution therefore represents a best-case prior distribution for the given set of 100 imbalanced empirical trees and is the same distribution for each rate row.

Similar articles

Cited by

References

    1. Aldous D. 1996. Probability distributions on cladograms. In: Aldous D., Pemantle R., editors. Random Discrete Structures. New York, NY: Springer New York. p. 1–18.
    1. Aldous D.J. 2001. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statist. Sci. 16:23–34.
    1. Anacker B.L., Strauss S.Y.. 2014. The geography and ecology of plant speciation: range overlap and niche divergence in sister species. Proc. Biol. Sci. 281:20132980. - PMC - PubMed
    1. Andréoletti J., Zwaans A., Warnock R.C.M., Aguirre-Fernández G., Barido-Sottani J., Gupta A., Stadler T., Manceau M.. 2022. The occurrence birth–death process for combined-evidence analysis in macroevolution and epidemiology. Syst. Biol. 71:1440–1452. - PMC - PubMed
    1. Attwood S.W., Hill S.C., Aanensen D.M., Connor T.R., Pybus O.G.. 2022. Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic. Nat. Rev. Genet. 23:547–562. - PMC - PubMed

Publication types