Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 1;72(5):1199-1206.
doi: 10.1093/sysbio/syad045.

Online tree expansion could help solve the problem of scalability in Bayesian phylogenetics

Affiliations

Online tree expansion could help solve the problem of scalability in Bayesian phylogenetics

Jakub Truszkowski et al. Syst Biol. .

Abstract

Bayesian phylogenetics is now facing a critical point. Over the last 20 years, Bayesian methods have reshaped phylogenetic inference and gained widespread popularity due to their high accuracy, the ability to quantify the uncertainty of inferences and the possibility of accommodating multiple aspects of evolutionary processes in the models that are used. Unfortunately, Bayesian methods are computationally expensive, and typical applications involve at most a few hundred sequences. This is problematic in the age of rapidly expanding genomic data and increasing scope of evolutionary analyses, forcing researchers to resort to less accurate but faster methods, such as maximum parsimony and maximum likelihood. Does this spell doom for Bayesian methods? Not necessarily. Here, we discuss some recently proposed approaches that could help scale up Bayesian analyses of evolutionary problems considerably. We focus on two particular aspects: online phylogenetics, where new data sequences are added to existing analyses, and alternatives to Markov chain Monte Carlo (MCMC) for scalable Bayesian inference. We identify 5 specific challenges and discuss how they might be overcome. We believe that online phylogenetic approaches and Sequential Monte Carlo hold great promise and could potentially speed up tree inference by orders of magnitude. We call for collaborative efforts to speed up the development of methods for real-time tree expansion through online phylogenetics.

Keywords: Bayesian inference; MCMC; phylogeny; sequential Monte Carlo.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Current methods for online phylogenetic analyses. All methods start from a reasonable tree estimate, which then needs to be improved based on the information contained in the new sequence. a) Maximum likelihood phylogenetic placement. Maximum likelihood methods use the existing phylogeny 1) to compute the likelihood of attaching a new sequence onto every possible branch 2) and choose the attachment that maximises the likelihood of the resulting tree 3). This method does not update the tree beyond the added branch. If repeated many times, this can significantly reduce the accuracy of the tree. b) MCMC iteration. An MCMC iteration consists of randomly proposing a small change to the topology or branch lengths of the current tree 1) which is then probabilistically accepted or rejected based on the posterior ratio between the proposed and the current tree 2). c) Online MCMC (Gill et al. 2020; Bouckaert et al. 2022). The starting point of an online MCMC analysis is informed by a previous MCMC run (1). As in the case with maximum likelihood, this method adds one or more sequences to a tree sampled from a previous MCMC run (2). After the sequences are added, the resulting tree serves as the starting point for another MCMC run; MCMC iterations are applied to the tree until a pre-defined convergence criterion is met(3, which is the same as an MCMC iteration as described in (b) but carried out successively). d) Online phylogenetic SMC (Fourment et al. 2018). SMC also samples a population of trees from the previous run 1), it then attaches a new sequence randomly to each of these trees based on the approximate posterior probability of possible attachments 2). Each tree is then assigned a weight proportional to the ratio of the likelihood of the tree with the sequence inserted against the likelihood of tree without the sequence 3)—see Equation (1). The trees are then resampled proportionally to their weights 4).

Similar articles

Cited by

References

    1. Andrieu C., Doucet A., Holenstein R.. 2010. Particle Markov chain Monte Carlo methods. J.R. Stat. Soc. 72(3):269–342
    1. Atteson K. 1999. The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica. 25(2):251–278.
    1. Ayres D.L., Cummings M.P., Baele G., Darling A.E., Lewis P.O., Swofford D.L., Huelsenbeck J.P., Lemey P., Rambaut A., Suchard M.A.. 2019. Beagle 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. Syst. Biol. 68(6):1052–1061. - PMC - PubMed
    1. Balaban M., Sarmashghi S., Mirarab S.. 2020. Apples: scalable distance-based phylogenetic placement with or without alignments. Syst. Biol. 69(3):566–578. - PMC - PubMed
    1. Balaban M., Jiang Y., Roush D., Zhu Q., Mirarab S.. 2022. Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour. 22(3):1213–1227. - PubMed