Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 10(Suppl 10):S2.
doi: 10.1186/1471-2164-16-S10-S2. Epub 2015 Oct 2.

A comparative study of SVDquartets and other coalescent-based species tree estimation methods

A comparative study of SVDquartets and other coalescent-based species tree estimation methods

Jed Chou et al. BMC Genomics. 2015.

Abstract

Background: Species tree estimation is challenging in the presence of incomplete lineage sorting (ILS), which can make gene trees different from the species tree. Because ILS is expected to occur and the standard concatenation approach can return incorrect trees with high support in the presence of ILS, "coalescent-based" summary methods (which first estimate gene trees and then combine gene trees into a species tree) have been developed that have theoretical guarantees of robustness to arbitrarily high amounts of ILS. Some studies have suggested that summary methods should only be used on "c-genes" (i.e., recombination-free loci) that can be extremely short (sometimes fewer than 100 sites). However, gene trees estimated on short alignments can have high estimation error, and summary methods tend to have high error on short c-genes. To address this problem, Chifman and Kubatko introduced SVDquartets, a new coalescent-based method. SVDquartets takes multi-locus unlinked single-site data, infers the quartet trees for all subsets of four species, and then combines the set of quartet trees into a species tree using a quartet amalgamation heuristic. Yet, the relative accuracy of SVDquartets to leading coalescent-based methods has not been assessed.

Results: We compared SVDquartets to two leading coalescent-based methods (ASTRAL-2 and NJst), and to concatenation using maximum likelihood. We used a collection of simulated datasets, varying ILS levels, numbers of taxa, and number of sites per locus. Although SVDquartets was sometimes more accurate than ASTRAL-2 and NJst, most often the best results were obtained using ASTRAL-2, even on the shortest gene sequence alignments we explored (with only 10 sites per locus). Finally, concatenation was the most accurate of all methods under low ILS conditions.

Conclusions: ASTRAL-2 generally had the best accuracy under higher ILS conditions, and concatenation had the best accuracy under the lowest ILS conditions. However, SVDquartets was competitive with the best methods under conditions with low ILS and small numbers of sites per locus. The good performance under many conditions of ASTRAL-2 in comparison to SVDquartets is surprising given the known vulnerability of ASTRAL-2 and similar methods to short gene sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Results on the 11-taxon simulated datasets. We show mean RF rates with standard error bars for 50 replicates using all four methods (RAxML shows concatenation). The rows are for four 11-taxon (10 ingroup taxa and one outgroup taxon) model conditions with varying ILS levels, ranging from very low (M1) to very high (M4). The columns are for the different numbers of genes. Within a subfigure, we show results with changing numbers of sites per locus (10-200). Note that the y-axis range changes for the fourth row, due to the much higher error rates under the highest ILS model condition. Sequence evolution on these datasets deviates from the strict molecular clock.
Figure 2
Figure 2
Results on the 15-taxon simulated datasets. We show mean RF error rates with standard error bars over 10 replicates, for 100 to 1000 genes. Within a subfigure, we show results with changing numbers of sites per locus (10-200). The 15-taxon model tree is a caterpillar (pectinate tree) with very short internal branches, and these datasets have very high ILS. Sequence evolution on these datasets is under the strict molecular clock.
Figure 3
Figure 3
Results on the low ILS mammalian simulated datasets. We show mean RF error rates with standard error bars over 20 replicates, for 50 to 200 genes. Within a subfigure, we show results with changing numbers of sites per locus (10-200). The mammalian simulation is for 37 species, and is based on an MP-EST analysis of a biological dataset with reduced ILS (produced by doubling the species tree branch lengths). Sequence evolution in this simulation deviates from the strict molecular clock.
Figure 4
Figure 4
The SVDquartets+PAUP* tree on the 37-taxon 424-gene mammalian dataset from Song et al. [40]. This tree has one branch with very low support, and so does not resolve the relationship between Cetartiodactyla, Chiroptera, and the clade ((Felis catus, Canis familiaris), Equus caballus). Labels on branches indicate bootstrap support, but support values of 100% are not shown.

Similar articles

Cited by

References

    1. Maddison WP. Gene trees in species trees. Syst Biol. 1997;46(3):523–536. doi: 10.1093/sysbio/46.3.523. - DOI
    1. Edwards SV. Is a new and general theory of molecular systematics emerging? Evolution. 2009;63(1):1–19. doi: 10.1111/j.1558-5646.2008.00549.x. - DOI - PubMed
    1. Kingman JFC. The coalescent. Stoch Process Their Appl. 1982;13(3):235–248. doi: 10.1016/0304-4149(82)90011-4. - DOI
    1. Warnow T. Concatenation analysis in the presence of incomplete lineage sorting. PLoS Curr: Tree of Life. 2015. - PMC - PubMed
    1. Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics. 2014;30(17):541–548. doi: 10.1093/bioinformatics/btu462. - DOI - PMC - PubMed

Publication types

LinkOut - more resources