Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods
- PMID: 20833951
- DOI: 10.1093/sysbio/syq047
Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods
Abstract
Discord in the estimated gene trees among loci can be attributed to both the process of mutation and incomplete lineage sorting. Effectively modeling these two sources of variation--mutational and coalescent variance--provides two distinct challenges for phylogenetic studies. Despite extensive investigation on mutational models for gene-tree estimation over the past two decades and recent attention to modeling of the coalescent process for phylogenetic estimation, the effects of these two variances have yet to be evaluated simultaneously. Here, we partition the effects of mutational and coalescent processes on phylogenetic accuracy by comparing the accuracy of species trees estimated from gene trees (i.e., the actual coalescent genealogies) with that of species trees estimated from estimated gene trees (i.e., trees estimated from nucleotide sequences, which contain both coalescent and mutational variance). Not only is there a significant contribution of both mutational and coalescent variance to errors in species-tree estimates, but the relative magnitude of the effects on the accuracy of species-tree estimation also differs systematically depending on 1) the timing of divergence, 2) the sampling design, and 3) the method used for species-tree estimation. These findings explain why using more information contained in gene trees (e.g., topology and branch lengths as opposed to just topology) does not necessarily translate into pronounced gains in accuracy, highlighting the strengths and limits of different methods for species-tree estimation. Differences in accuracy scores between methods for different sampling regimes also emphasize that it would be a mistake to assume more computationally intensive species-tree estimation procedures that will always provide better estimates of species trees. To the contrary, the performance of a method depends not only on the method per se but also on the compatibilities between the input genetic data and the method as determined by the relative impact of mutational and coalescent variance.
Similar articles
-
Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design.Syst Biol. 2009 Oct;58(5):501-8. doi: 10.1093/sysbio/syp045. Epub 2009 Aug 20. Syst Biol. 2009. PMID: 20525604
-
Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers.Syst Biol. 2007 Jun;56(3):400-11. doi: 10.1080/10635150701405560. Syst Biol. 2007. PMID: 17520504
-
What is the danger of the anomaly zone for empirical phylogenetics?Syst Biol. 2009 Oct;58(5):527-36. doi: 10.1093/sysbio/syp047. Epub 2009 Aug 26. Syst Biol. 2009. PMID: 20525606
-
Coalescent methods for estimating phylogenetic trees.Mol Phylogenet Evol. 2009 Oct;53(1):320-8. doi: 10.1016/j.ympev.2009.05.033. Epub 2009 Jun 6. Mol Phylogenet Evol. 2009. PMID: 19501178 Review.
-
Estimating phylogenetic trees from genome-scale data.Ann N Y Acad Sci. 2015 Dec;1360:36-53. doi: 10.1111/nyas.12747. Epub 2015 Apr 14. Ann N Y Acad Sci. 2015. PMID: 25873435 Review.
Cited by
-
Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent.J Math Biol. 2017 Jan;74(1-2):447-467. doi: 10.1007/s00285-016-1034-0. Epub 2016 Jun 10. J Math Biol. 2017. PMID: 27287395
-
Concatenation and Species Tree Methods Exhibit Statistically Indistinguishable Accuracy under a Range of Simulated Conditions.PLoS Curr. 2015 Mar 9;7:ecurrents.tol.34260cc27551a527b124ec5f6334b6be. doi: 10.1371/currents.tol.34260cc27551a527b124ec5f6334b6be. PLoS Curr. 2015. PMID: 25901289 Free PMC article.
-
An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines.BMC Evol Biol. 2014 Mar 29;14:67. doi: 10.1186/1471-2148-14-67. BMC Evol Biol. 2014. PMID: 24678701 Free PMC article.
-
A framework phylogeny of the American oak clade based on sequenced RAD data.PLoS One. 2014 Apr 4;9(4):e93975. doi: 10.1371/journal.pone.0093975. eCollection 2014. PLoS One. 2014. PMID: 24705617 Free PMC article.
-
Evaluating multiple criteria for species delimitation: an empirical example using Hawaiian palms (Arecaceae: Pritchardia).BMC Evol Biol. 2012 Feb 22;12:23. doi: 10.1186/1471-2148-12-23. BMC Evol Biol. 2012. PMID: 22353848 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous