Sequential Bayesian Phylogenetic Inference
- PMID: 38771253
- DOI: 10.1093/sysbio/syae020
Sequential Bayesian Phylogenetic Inference
Abstract
The ideal approach to Bayesian phylogenetic inference is to estimate all parameters of interest jointly in a single hierarchical model. However, this is often not feasible in practice due to the high computational cost. Instead, phylogenetic pipelines generally consist of sequential analyses, whereby a single point estimate from a given analysis is used as input for the next analysis (e.g., a single multiple sequence alignment is used to estimate a gene tree). In this framework, uncertainty is not propagated from step to step, which can lead to inaccurate or spuriously confident results. Here, we formally develop and test a sequential inference approach for Bayesian phylogenetic inference, which uses importance sampling to generate observations for the next step of an analysis pipeline from the posterior distribution produced in the previous step. Our sequential inference approach presented here not only accounts for uncertainty between analysis steps but also allows for greater flexibility in software choice (and hence model availability) and can be computationally more efficient than the traditional joint inference approach when multiple models are being tested. We show that our sequential inference approach is identical in practice to the joint inference approach only if sufficient information in the data is present (a narrow posterior distribution) and/or sufficiently many important samples are used. Conversely, we show that the common practice of using a single point estimate can be biased, for example, a single phylogeny estimate can transform an unrooted phylogeny into a time-calibrated phylogeny. We demonstrate the theory of sequential Bayesian inference using both a toy example and an empirical case study of divergence-time estimation in insects using a relaxed clock model from transcriptome data. In the empirical example, we estimate 3 posterior distributions of branch lengths from the same data (DNA character matrix with a GTR+Γ+I substitution model, an amino acid data matrix with empirical substitution models, and an amino acid data matrix with the PhyloBayes CAT-GTR model). Finally, we apply 3 different node-calibration strategies and show that divergence time estimates are affected by both the data source and underlying substitution process to estimate branch lengths as well as the node-calibration strategies. Thus, our new sequential Bayesian phylogenetic inference provides the opportunity to efficiently test different approaches for divergence time estimation, including branch-length estimation from other software.
Keywords: Bayesian inference; RevBayes; divergence time estimation; joint posterior distribution; parameter uncertainty; phylogenetics.
© The Author(s) 2024. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.
Similar articles
-
Characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci.Syst Biol. 2015 Mar;64(2):267-80. doi: 10.1093/sysbio/syu109. Epub 2014 Dec 11. Syst Biol. 2015. PMID: 25503979 Free PMC article.
-
Bayesian coestimation of phylogeny and sequence alignment.BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83. BMC Bioinformatics. 2005. PMID: 15804354 Free PMC article.
-
BayesCAT: Bayesian co-estimation of alignment and tree.Biometrics. 2018 Mar;74(1):270-279. doi: 10.1111/biom.12640. Epub 2017 Jan 18. Biometrics. 2018. PMID: 28099991
-
Scalable Bayesian phylogenetics.Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210242. doi: 10.1098/rstb.2021.0242. Epub 2022 Aug 22. Philos Trans R Soc Lond B Biol Sci. 2022. PMID: 35989603 Free PMC article. Review.
-
Evaluating the robustness of parameter estimates in cognitive models: A meta-analytic review of multinomial processing tree models across the multiverse of estimation methods.Psychol Bull. 2024 Aug;150(8):965-1003. doi: 10.1037/bul0000434. Epub 2024 Jun 27. Psychol Bull. 2024. PMID: 38934916 Review.
Cited by
-
Evolutionary and epidemic dynamics of COVID-19 in Germany exemplified by three Bayesian phylodynamic case studies.Bioinform Biol Insights. 2025 Mar 12;19:11779322251321065. doi: 10.1177/11779322251321065. eCollection 2025. Bioinform Biol Insights. 2025. PMID: 40078196 Free PMC article.
-
Comparison of Bayesian Coalescent Skyline Plot Models for Inferring Demographic Histories.Mol Biol Evol. 2024 May 3;41(5):msae073. doi: 10.1093/molbev/msae073. Mol Biol Evol. 2024. PMID: 38630635 Free PMC article.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous