Fast and consistent estimation of species trees using supermatrix rooted triples
- PMID: 19833741
- PMCID: PMC2877557
- DOI: 10.1093/molbev/msp250
Fast and consistent estimation of species trees using supermatrix rooted triples
Abstract
Concatenated sequence alignments are often used to infer species-level relationships. Previous studies have shown that analysis of concatenated data using maximum likelihood (ML) can produce misleading results when loci have differing gene tree topologies due to incomplete lineage sorting. Here, we develop a polynomial time method that utilizes the modified mincut supertree algorithm to construct an estimated species tree from inferred rooted triples of concatenated alignments. We term this method SuperMatrix Rooted Triple (SMRT) and use the notation SMRT-ML when rooted triples are inferred by ML. We use simulations to investigate the performance of SMRT-ML under Jukes-Cantor and general time-reversible substitution models for four- and five-taxon species trees and also apply the method to an empirical data set of yeast genes. We find that SMRT-ML converges to the correct species tree in many cases in which ML on the full concatenated data set fails to do so. SMRT-ML can be conservative in that its output tree is often partially unresolved for problematic clades. We show analytically that when the species tree is clocklike and mutations occur under the Cavender-Farris-Neyman substitution model, as the number of genes increases, SMRT-ML is increasingly likely to infer the correct species tree even when the most likely gene tree does not match the species tree. SMRT-ML is therefore a computationally efficient and statistically consistent estimator of the species tree when gene trees are distributed according to the multispecies coalescent model.
Figures
all alignments of three species, which are then fed through PAUP* to infer a total of
rooted triples. These rooted triples are used as input to supertree (Page 2002) to infer a species tree (SMRT-ML). The dashed gray box represents the part of the procedure that is SMRT-ML.
References
-
- Aho AV, Sagiv Y, Szymanski TG, Ullman JD. Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J Comput. 1981;10:405–421.
-
- Ané C, Larget B, Baum DA, Smith SD, Rokas A. Bayesian estimation of concordance factors. Mol Biol Evol. 2007;24:412–426. - PubMed
-
- Baum BR. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon. 1992;41:3–10.
-
- Bininda-Emonds ORP. The evolution of supertrees. Trends Ecol Evol. 2004;19:315–322. - PubMed
-
- Bryant D. A classification of consensus methods for phylogenies. In: Janowitz MF, Lapointe F-J, McMorris FR, Mirkin B, Roberts FS, editors. Bioconsensus. Vol. 61. Providence (RI): DIMACS, AMS; 2003. pp. 163–183.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
