StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates

doi:10.1093/molbev/msx126

. 2017 Aug 1;34(8):2101-2114.

doi: 10.1093/molbev/msx126.

StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates

Huw A Ogilvie^{1

2}, Remco R Bouckaert^{2

3}, Alexei J Drummond^{2

3}

Affiliations

¹ Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia.
² Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.
³ Department of Computer Science, University of Auckland, Auckland, New Zealand.

PMID: 28431121
PMCID: PMC5850801
DOI: 10.1093/molbev/msx126

StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates

Huw A Ogilvie et al. Mol Biol Evol. 2017.

. 2017 Aug 1;34(8):2101-2114.

doi: 10.1093/molbev/msx126.

Authors

Huw A Ogilvie^{1

2}, Remco R Bouckaert^{2

3}, Alexei J Drummond^{2

3}

Affiliations

¹ Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia.
² Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.
³ Department of Computer Science, University of Auckland, Auckland, New Zealand.

PMID: 28431121
PMCID: PMC5850801
DOI: 10.1093/molbev/msx126

Abstract

Fully Bayesian multispecies coalescent (MSC) methods like *BEAST estimate species trees from multiple sequence alignments. Today thousands of genes can be sequenced for a given study, but using that many genes with *BEAST is intractably slow. An alternative is to use heuristic methods which compromise accuracy or completeness in return for speed. A common heuristic is concatenation, which assumes that the evolutionary history of each gene tree is identical to the species tree. This is an inconsistent estimator of species tree topology, a worse estimator of divergence times, and induces spurious substitution rate variation when incomplete lineage sorting is present. Another class of heuristics directly motivated by the MSC avoids many of the pitfalls of concatenation but cannot be used to estimate divergence times. To enable fuller use of available data and more accurate inference of species tree topologies, divergence times, and substitution rates, we have developed a new version of *BEAST called StarBEAST2. To improve convergence rates we add analytical integration of population sizes, novel MCMC operators and other optimizations. Computational performance improved by 13.5× and 13.8× respectively when analyzing two empirical data sets, and an average of 33.1× across 30 simulated data sets. To enable accurate estimates of per-species substitution rates, we introduce species tree relaxed clocks, and show that StarBEAST2 is a more powerful and robust estimator of rate variation than concatenation. StarBEAST2 is available through the BEAUTi package manager in BEAST 2.4 and above.

Keywords: concatenation; incomplete lineage sorting; multispecies coalescent; phylogenetic methods; relaxed clocks; species trees.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1. — **Fig. 1.**
Two-species phylogeny used to illustrate species tree relaxed clocks. There are two extant species “A” and “B”, and one ancestral species “AB.” Within the species tree there is a single gene tree with extant individuals “a” and “b.” The single speciation event occurs at time T1, and the single coalescence event occurs at time t1. Gene tree rates are computed according to table 1.

<sc>Fig</sc>. 2. — **Fig. 2.**
Accuracy of branch substitution rates and lengths inferred by BEAST concatenation and StarBEAST2. Deviation is the difference of each estimated rate and length from the true value. Estimated rates and lengths are the posterior expectation of the overall substitution rate and length for each species tree branch. Black crosses in each panel indicate the point of perfect accuracy. Each panel shows the distributions for the labeled extant or ancestral branch. N = 96.

<sc>Fig</sc>. 3. — **Fig. 3.**
Impact of operators, population size integration and clock models on convergence. The estimated sample size (ESS) per hour for a given replicate used the smallest ESS out of all recorded statistics. Topology refers to the replacement of naïve nearest-neighbor interchange and subtree prune and regraft operators with coordinated operators. Height refers to the addition of operators which make coordinated changes to node heights. Uncorrelated log-normal relaxed clocks were applied to each gene tree (GT-UCLN) or to the species tree (ST-UCLN). N = 30.

<sc>Fig</sc>. 4. — **Fig. 4.**
Convergence of different methods applied to simulated and empirical data sets. The estimated sample size (ESS) per hour for a given replicate used the slowest ESS rate out of all recorded statistics. Methods are BEAST concatenation, *BEAST, and StarBEAST2 with uncorrelated log-normal relaxed clocks applied to each gene tree (GT-UCLN) or to the species tree (ST-UCLN). Two *Pseudacris* *BEAST outliers with ESS rates below 0.1 are not shown. N = 30.

<sc>Fig</sc>. 5. — **Fig. 5.**
Coverage and accuracy of species branch lengths using different methods. Methods are StarBEAST2, *BEAST, and BEAST concatenation with uncorrelated log-normal relaxed clocks applied to each gene tree (GT-UCLN) or to the species tree (ST-UCLN). (A, B) The percentages of true branch lengths present within the corresponding 95% highest posterior density (HPD) credible intervals. (C, D) The difference between the sum of estimated branch lengths and the sum of true branch lengths as a percentage of the sum of true branch lengths. (E, F) The sum of absolute differences between estimated and simulated branch lengths as a percentage of true tree length. N = 30.

<sc>Fig</sc>. 6. — **Fig. 6.**
Coverage and accuracy of species tree topologies using different methods. Methods are StarBEAST2, *BEAST, and BEAST concatenation with uncorrelated log-normal relaxed clocks applied to each gene tree (GT-UCLN) or to the species tree (ST-UCLN). (A) The percentage of true species tree topologies within the 95% credible set of topologies. (B) The average rooted Robinson–Foulds (RF) distance between the maximum clade credibility (MCC) species tree topology and the simulated true topology. Error bars are 95% confidence intervals calculated by bootstrapping. N = 30.

<sc>Fig</sc>. 7. — **Fig. 7.**
Estimates of species tree branch rates using BEAST concatenation versus StarBEAST2. Estimated rates are the posterior expectations of each branch rate from each replicate. Root branch rates, which were fixed at 1, were excluded. In blue are simple linear regression lines of best fit, and in red are the $y = x$ lines showing a perfect relationship between estimates and truth. N = 30.

See this image and copyright information in PMC

Cited by

A revision of the trichostrongylid nematode Cooperia Ransom, 1907, from deer game: recent integrative research confirms the existence of the ancient host-specific species Cooperia ventricosa (Rudolphi, 1809).
Albrechtová M, Kašparová EŠ, Langrová I, Hart V, Neuhaus B, Jankovská I, Petrtýl M, Magdálek J, Špakulová M. Albrechtová M, et al. Front Vet Sci. 2024 Feb 8;11:1346417. doi: 10.3389/fvets.2024.1346417. eCollection 2024. Front Vet Sci. 2024. PMID: 38389582 Free PMC article.
Ranked Subtree Prune and Regraft.
Collienne L, Whidden C, Gavryushkin A. Collienne L, et al. Bull Math Biol. 2024 Jan 31;86(3):24. doi: 10.1007/s11538-023-01244-2. Bull Math Biol. 2024. PMID: 38294587 Free PMC article.
Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.
Molloy EK, Warnow T. Molloy EK, et al. Algorithms Mol Biol. 2019 Jul 19;14:14. doi: 10.1186/s13015-019-0151-x. eCollection 2019. Algorithms Mol Biol. 2019. PMID: 31360216 Free PMC article.
Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics.
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Bravo GA, et al. PeerJ. 2019 Feb 14;7:e6399. doi: 10.7717/peerj.6399. eCollection 2019. PeerJ. 2019. PMID: 30783571 Free PMC article.
PhyloAcc-GT: A Bayesian Method for Inferring Patterns of Substitution Rate Shifts on Targeted Lineages Accounting for Gene Tree Discordance.
Yan H, Hu Z, Thomas GWC, Edwards SV, Sackton TB, Liu JS. Yan H, et al. Mol Biol Evol. 2023 Sep 1;40(9):msad195. doi: 10.1093/molbev/msad195. Mol Biol Evol. 2023. PMID: 37665177 Free PMC article.

See all "Cited by" articles

References

1. Aberer AJ, Kobert K, Stamatakis A.. 2014. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol. 3110:2553–2556. - PMC - PubMed
1. Andrieu C, Thoms J.. 2008. A tutorial on adaptive MCMC. Stat Comput. 184:343–373.
1. Arbogast BS, Edwards SV, Wakeley J, Beerli P, Slowinski JB.. 2002. Estimating divergence times from molecular data on phylogenetic and population genetic timescales. Annu Rev Ecol Syst. 33:707–740.
1. Baer CF, Miyamoto MM, Denver DR.. 2007. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat Rev Genet. 88:619–631. - PubMed
1. Barrow LN, Ralicki HF, Emme SA, Lemmon EM.. 2014. Species tree estimation of North American chorus frogs (Hylidae: Pseudacris) with parallel tagged amplicon sequencing. Mol Phylogenet Evol. 75:78–90. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- scite Smart Citations

[1] Aberer AJ, Kobert K, Stamatakis A.. 2014. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol. 3110:2553–2556. - PMC - PubMed

[2] Aberer AJ, Kobert K, Stamatakis A.. 2014. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol. 3110:2553–2556. - PMC - PubMed

[3] Andrieu C, Thoms J.. 2008. A tutorial on adaptive MCMC. Stat Comput. 184:343–373.

[4] Andrieu C, Thoms J.. 2008. A tutorial on adaptive MCMC. Stat Comput. 184:343–373.

[5] Arbogast BS, Edwards SV, Wakeley J, Beerli P, Slowinski JB.. 2002. Estimating divergence times from molecular data on phylogenetic and population genetic timescales. Annu Rev Ecol Syst. 33:707–740.

[6] Arbogast BS, Edwards SV, Wakeley J, Beerli P, Slowinski JB.. 2002. Estimating divergence times from molecular data on phylogenetic and population genetic timescales. Annu Rev Ecol Syst. 33:707–740.

[7] Baer CF, Miyamoto MM, Denver DR.. 2007. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat Rev Genet. 88:619–631. - PubMed

[8] Baer CF, Miyamoto MM, Denver DR.. 2007. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat Rev Genet. 88:619–631. - PubMed

[9] Barrow LN, Ralicki HF, Emme SA, Lemmon EM.. 2014. Species tree estimation of North American chorus frogs (Hylidae: Pseudacris) with parallel tagged amplicon sequencing. Mol Phylogenet Evol. 75:78–90. - PubMed

[10] Barrow LN, Ralicki HF, Emme SA, Lemmon EM.. 2014. Species tree estimation of North American chorus frogs (Hylidae: Pseudacris) with parallel tagged amplicon sequencing. Mol Phylogenet Evol. 75:78–90. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates

Affiliations

StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources