Challenges in Species Tree Estimation Under the Multispecies Coalescent Model
- PMID: 27927902
- PMCID: PMC5161269
- DOI: 10.1534/genetics.116.190173
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model
Abstract
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.
Keywords: BPP; anomaly zone; concatenation; gene trees; incomplete lineage sorting; maximum likelihood; multispecies coalescent; species trees.
Copyright © 2016 by the Genetics Society of America.
Figures










Similar articles
-
Modern Phylogenomics: Building Phylogenetic Trees Using the Multispecies Coalescent Model.Methods Mol Biol. 2019;1910:211-239. doi: 10.1007/978-1-4939-9074-0_7. Methods Mol Biol. 2019. PMID: 31278666
-
To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.Syst Biol. 2018 Mar 1;67(2):285-303. doi: 10.1093/sysbio/syx077. Syst Biol. 2018. PMID: 29029338
-
Coalescent-Based Analyses of Genomic Sequence Data Provide a Robust Resolution of Phylogenetic Relationships among Major Groups of Gibbons.Mol Biol Evol. 2018 Jan 1;35(1):159-179. doi: 10.1093/molbev/msx277. Mol Biol Evol. 2018. PMID: 29087487 Free PMC article.
-
Coalescent methods for estimating phylogenetic trees.Mol Phylogenet Evol. 2009 Oct;53(1):320-8. doi: 10.1016/j.ympev.2009.05.033. Epub 2009 Jun 6. Mol Phylogenet Evol. 2009. PMID: 19501178 Review.
-
Estimating phylogenetic trees from genome-scale data.Ann N Y Acad Sci. 2015 Dec;1360:36-53. doi: 10.1111/nyas.12747. Epub 2015 Apr 14. Ann N Y Acad Sci. 2015. PMID: 25873435 Review.
Cited by
-
Editorial: Evolutionary Feedbacks Between Population Biology and Genome Architecture.Front Genet. 2018 Aug 21;9:329. doi: 10.3389/fgene.2018.00329. eCollection 2018. Front Genet. 2018. PMID: 30186309 Free PMC article. No abstract available.
-
Phylogenetic tree building in the genomic age.Nat Rev Genet. 2020 Jul;21(7):428-444. doi: 10.1038/s41576-020-0233-0. Epub 2020 May 18. Nat Rev Genet. 2020. PMID: 32424311 Review.
-
Effect of Different Types of Sequence Data on Palaeognath Phylogeny.Genome Biol Evol. 2023 Jun 1;15(6):evad092. doi: 10.1093/gbe/evad092. Genome Biol Evol. 2023. PMID: 37227001 Free PMC article.
-
Resolving Recalcitrant Clades in the Pantropical Ochnaceae: Insights From Comparative Phylogenomics of Plastome and Nuclear Genomic Data Derived From Targeted Sequencing.Front Plant Sci. 2021 Feb 4;12:638650. doi: 10.3389/fpls.2021.638650. eCollection 2021. Front Plant Sci. 2021. PMID: 33613613 Free PMC article.
-
An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times.Syst Biol. 2018 Jan 1;67(1):61-77. doi: 10.1093/sysbio/syx061. Syst Biol. 2018. PMID: 29029343 Free PMC article.
References
-
- Allman E. S., Degnan J. H., Rhodes J. A., 2011. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J. Math. Biol. 62: 833–862. - PubMed
-
- Burgess R., Yang Z., 2008. Estimation of hominoid ancestral population sizes under Bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25: 1979–1994. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous