. 2019 Apr 8;15(4):e1006650.

doi: 10.1371/journal.pcbi.1006650. eCollection 2019 Apr.

BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis

Remco Bouckaert^{1

2}, Timothy G Vaughan^{3

4}, Joëlle Barido-Sottani^{3

4}, Sebastián Duchêne⁵, Mathieu Fourment⁶, Alexandra Gavryushkina⁷, Joseph Heled⁸, Graham Jones⁹, Denise Kühnert², Nicola De Maio¹⁰, Michael Matschiner¹¹, Fábio K Mendes¹, Nicola F Müller^{3

4}, Huw A Ogilvie¹², Louis du Plessis¹³, Alex Popinga¹, Andrew Rambaut¹⁴, David Rasmussen¹⁵, Igor Siveroni¹⁶, Marc A Suchard¹⁷, Chieh-Hsi Wu¹⁸, Dong Xie¹, Chi Zhang¹⁹, Tanja Stadler^{3

4}, Alexei J Drummond¹

Affiliations

¹ Centre of Computational Evolution, University of Auckland, Auckland, New Zealand.
² Max Planck Institute for the Science of Human History, Jena, Germany.
³ ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland.
⁴ Swiss Institute of Bioinformatics, Lausanne, Switzerland.
⁵ Department of Biochemistry and Molecular Biology, University of Melbourne, Melbourne, Victoria, Australia.
⁶ ithree institute, University of Technology Sydney, Sydney, Australia.
⁷ Department of Biochemistry, University of Otago, Dunedin 9016, New Zealand.
⁸ Independent researcher, Auckland, New Zealand.
⁹ Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE 405 30 Göteborg, Sweden.
¹⁰ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridgeshire, UK.
¹¹ Department of Environmental Sciences, University of Basel, 4051 Basel, Switzerland.
¹² Department of Computer Science, Rice University, Houston, TX 77005-1892, USA.
¹³ Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK.
¹⁴ Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, EH9 3FL UK.
¹⁵ Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27695, USA.
¹⁶ Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK.
¹⁷ Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
¹⁸ Department of Statistics, University of Oxford, OX1 3LB, UK.
¹⁹ Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China.

PMID: 30958812
PMCID: PMC6472827
DOI: 10.1371/journal.pcbi.1006650

BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis

Remco Bouckaert et al. PLoS Comput Biol. 2019.

. 2019 Apr 8;15(4):e1006650.

doi: 10.1371/journal.pcbi.1006650. eCollection 2019 Apr.

Authors

Affiliations

¹ Centre of Computational Evolution, University of Auckland, Auckland, New Zealand.
² Max Planck Institute for the Science of Human History, Jena, Germany.
³ ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland.
⁴ Swiss Institute of Bioinformatics, Lausanne, Switzerland.
⁵ Department of Biochemistry and Molecular Biology, University of Melbourne, Melbourne, Victoria, Australia.
⁶ ithree institute, University of Technology Sydney, Sydney, Australia.
⁷ Department of Biochemistry, University of Otago, Dunedin 9016, New Zealand.
⁸ Independent researcher, Auckland, New Zealand.
⁹ Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE 405 30 Göteborg, Sweden.
¹⁰ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridgeshire, UK.
¹¹ Department of Environmental Sciences, University of Basel, 4051 Basel, Switzerland.
¹² Department of Computer Science, Rice University, Houston, TX 77005-1892, USA.
¹³ Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK.
¹⁴ Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, EH9 3FL UK.
¹⁵ Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC 27695, USA.
¹⁶ Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, W2 1PG, UK.
¹⁷ Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
¹⁸ Department of Statistics, University of Oxford, OX1 3LB, UK.
¹⁹ Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China.

PMID: 30958812
PMCID: PMC6472827
DOI: 10.1371/journal.pcbi.1006650

Abstract

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Phylogenetic structures available in BEAST 2.**
(a) A tip-dated time tree, with leaf times as boundary conditions but not data (generally a coalescent prior is applied in this setting). (b) A species tree with one or more embedded gene trees (c) A multi-type time tree has measured types at the leaves and the type changes that paint the ancestral lineages in the tree are sampled as latent variables by MCMC. (d) A sampled ancestor tree, with two types of sampling events: extinct species (red) and extant species (blue). Extinct species can be leaves or, if they are the direct ancestor of another sample, degree-2 sampled ancestor nodes. (e) An ancestral gene conversion graph is composed of a clonal frame (solid time tree) and an extra edge and gene boundaries for each gene conversion event. (f) A species network with one or more embedded gene trees.

**Fig 2. bModelTest analysis for 36 mammalian species [50].**
a) Posterior distribution of substitution models. Each circle represents a substitution model indicated by a six digit number corresponding to the six rates of reversible substitution models. In alphabetical order, these are A→C, A→G, A→T, C→G, C→T, and G→T, which can be shared in groups. The six digit numbers indicate these groupings, for example 121121 indicates the HKY model, which has shared rates for transitions and shared rates for transversions. Here, only models are considered that are reversible and do not share transition and transversion rates (with the exception of the JC69 and F81 models). Other substitution model sets are available. Links between substitution models indicate possible jumps during the MCMC chain from simpler (tail of arrow) to more complex (head of arrow) models and back. There is no single preferred substitution model for this data, as the posterior probability is spread over a number of alternative substitution models. Blue circles indicate the eight models contained in the 95% credible set, models with red circles are outside of this set, and models without circles have negligible support. b) Posterior tree distribution resulting from the bModelTest analysis.

**Fig 3. Birth-death skyline (bdsky) analysis of the 2013–2016 West African Ebola virus disease epidemic.**
(a) The maximum clade credibility tree of the 811 sequences used in the analysis. (b) The median posterior estimate of the estimated effective reproductive number (R_e) over time is shown in orange, with the 95% highest posterior density (HPD) interval in orange shading. The red dotted line indicates the epidemic threshold (R_e = 1). If R_e is below this threshold the epidemic has reached a turning point and is no longer spreading. The posterior distribution of the origin time of the epidemic (t₀) is shown in green. The number of laboratory-confirmed cases per week is shown in blue. Red arrows indicate weeks with fewer than 10 confirmed cases. The dotted line at A indicates the onset of symptoms in the suspected index case (see text for details). The dotted lines at B and C indicate the dates at which the WHO declared an Ebola virus disease outbreak in Guinea and a Public Health Emergency of International Concern (PHEIC), respectively. The dotted line at D indicates the first time any of the three countries with intense transmission (Liberia) was declared Ebola free following 42 days without any new infections being reported (new cases were subsequently detected in Liberia in June 2015). (c) The median posterior estimate of the monthly sampling proportion is shown in purple, with the 95% HPD interval in purple shading. The red dashed line indicates the number of sampled sequences in the dataset, divided by the number of laboratory-confirmed cases, for each month in the analysis. This serves as an empirical estimate of the true sampling proportion. The posterior distributions and medians (dashed lines) of the infected period and the mean clock rate (truncated at the 95% HPD limits) are shown in panels (d) and (e).

**Fig 4. The multispecies coalescent (MSC) model with three species and a single gene tree.**
A separate coalescent process applies to each of the five branches in the tree; the branches for the extant species A (red), B (green) and C (blue), the ancestral branch of A and B (yellow), and the root branch (grey). Several individuals have been sampled per species. In this example the ancestral lineage of individual b₄ does not coalesce in species B or ancestral species 4. In ancestral species 5, it coalesces with the ancestral lineage of species C. This leads to incomplete lineage sorting and enables gene tree discordance—in this example b₄ is a sister taxon to individuals from species C, rather than to individuals from its own species, or sister species A. If b₄ was the representative individual for its species, then this gene would exhibit gene tree discordance. Other individuals which show concordance at this locus are expected to show discordance at other unlinked loci when populations are large or speciation times are recent.

**Fig 5. AIM analysis of 100 nuclear gene alignments for the five Princess cichlid species.**
Species are *Neolamprologus marunguensis*, N. *gracilis*, N. *brichardi*, N. *olivaceous*, N. *pulcher*, as well as the outgroup *Metriaclima zebra*. a) to d) show the best-supported tree topologies. Arrows show directions of gene flow that are supported with a Bayes Factor of more than 10. Trees a) and c) only differ in the timing of the speciation events; however, AIM differentiates between differently ranked topologies, since these have to be characterized by using different parameters.

**Fig 6. Posterior predictive distributions for two phylodynamic models.**
The right column shows the trajectories of the reproductive number over time for a set of 100 publicly available genomes from the 2009 H1N1 influenza pandemic in North America using stochastic (birth-death SIR; [28]) and deterministic (deterministic coalescent SIR [27]) models. Each blue line is a trajectory sampled from the posterior distribution. The models make different inferences of when the reproductive number falls below 1 (vertical dotted line; the horizontal dashed line is for R = 1), indicating that the pandemic is past its infectious peak. The right column shows the posterior predictive distributions of the root height for both models (grey histograms) and the value for the empirical data (orange vertical lines). Trees simulated from the stochastic model produce trees that are more consistent with the empirical tree than those from the deterministic model, suggesting that stochasticity may play an important role in the early stages of the pandemic (samples were collected up to June 2009).

See this image and copyright information in PMC

References

1. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS computational biology. 2014;10(4):e1003537 10.1371/journal.pcbi.1003537 - DOI - PMC - PubMed
1. Drummond AJ, Bouckaert RR. Bayesian evolutionary analysis with BEAST. Cambridge University Press; 2015.
1. Bouckaert R, Heled J. DensiTree 2: Seeing trees through the forest. bioRxiv. 2014; p. 012401.
1. Vaughan TG, Drummond AJ. A stochastic simulator of birth–death master equations with application to phylodynamics. Molecular biology and evolution. 2013;30(6):1480–1493. 10.1093/molbev/mst057 - DOI - PMC - PubMed
1. Vaughan TG, Kühnert D, Popinga A, Welch D, Drummond AJ. Efficient Bayesian inference under the structured coalescent. Bioinformatics. 2014;30(16):2272–2279. 10.1093/bioinformatics/btu201 - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

U01 GM110749/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis

Affiliations

BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources