Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct 5;12 Suppl 9(Suppl 9):S4.
doi: 10.1186/1471-2105-12-S9-S4.

Fast and accurate methods for phylogenomic analyses

Affiliations

Fast and accurate methods for phylogenomic analyses

Jimmy Yang et al. BMC Bioinformatics. .

Abstract

Background: Species phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets. However, true gene trees can differ from the true species tree (and hence from one another) due to biological processes such as horizontal gene transfer, incomplete lineage sorting, and gene duplication and loss, so that no single gene tree is a reliable estimate of the species tree. Several methods have been developed to estimate species trees from estimated gene trees, differing according to the specific algorithmic technique used and the biological model used to explain differences between species and gene trees. Relatively little is known about the relative performance of these methods.

Results: We report on a study evaluating several different methods for estimating species trees from sequence datasets, simulating sequence evolution under a complex model including indels (insertions and deletions), substitutions, and incomplete lineage sorting. The most important finding of our study is that some fast and simple methods are nearly as accurate as the most accurate methods, which employ sophisticated statistical methods and are computationally quite intensive. We also observe that methods that explicitly consider errors in the estimated gene trees produce more accurate trees than methods that assume the estimated gene trees are correct.

Conclusions: Our study shows that highly accurate estimations of species trees are achievable, even when gene trees differ from each other and from the species tree, and that these estimations can be obtained using fairly simple and computationally tractable methods.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Running time on 100-taxon non-ILS datasets Average running time (y-axis given in log-scale) of methods on 100-taxon non-ILS datasets with MAFFT alignments of (A) 25 and (B) 50 genes. MrBayes performed two runs of 1M MCMC iterations in each analysis, with an average running time of 25 hours per sequence alignment. At the end of its analysis, MrBayes reported an average standard deviation of bipartitions at 0.065, indicating that it was far from convergence. BUCKy used 15K trees per gene for the full MrBayes distribution and 2K trees per gene on the sparse. RAxML performed 100 bootstrap replicates under GTRCAT. BUCKy analyses on RAxML used 100 trees per gene.
Figure 2
Figure 2
Missing branch rates on 17-taxon 32-gene datasets with ILS Average missing branch rates of methods on 25 17-taxon 32-gene datasets with incomplete lineage sorting. (A) shows results for the slow methods, (B) shows results for the fast methods, and (C) shows representative methods of both types. Bars indicate standard error.
Figure 3
Figure 3
Missing branch rates on true alignments of 100-taxon 25-gene datasets with ILS Average missing branch rate of methods on ten (10) 100-taxon 25-gene datasets with incomplete lineage sorting on true alignments. (A) shows results for the slow methods, (B) shows results for the fast methods, and (C) shows results for representative methods of both types. Bars indicate standard error.
Figure 4
Figure 4
Missing branch rates of fast methods on 100-taxon datasets without ILS Average missing branch rate of fast methods on 120 100-taxon datasets without incomplete lineage sorting for 25 and 50 genes. (A) shows results for the true alignments and (B) shows results for the MAFFT alignments. Bars indicate standard error.
Figure 5
Figure 5
Missing branch rates on true alignments for 100-taxon datasets without ILS Average missing branch rate of methods on six (6) 100-taxon datasets without incomplete lineage sorting on true alignments for 25 and 50 genes. (A) shows results for the slow methods, (B) shows results for the fast methods, and (C) shows results for representative methods of both types. Bars indicate standard error.

Similar articles

Cited by

References

    1. Maddison WP. Gene trees in species trees. Syst Biol. 1997;46:523–536. doi: 10.1093/sysbio/46.3.523. - DOI
    1. Kingman JFC. The coalescent. Stoch Proc Appl. 1982;13:235–248. doi: 10.1016/0304-4149(82)90011-4. - DOI
    1. Chen FC, Li WH. Genomic divergences between human and other hominids and the effective population size of the common ancestor of humans and chimpanzees. Am J Hum Genet. 2001;68:444–456. doi: 10.1086/318206. - DOI - PMC - PubMed
    1. Edwards SV. Is a new and general theory of molecular systematics emerging? Evolution. 2009;63:1–19. doi: 10.1111/j.1558-5646.2008.00549.x. - DOI - PubMed
    1. Zhang L. From Gene Trees to Species Trees II: Species Tree Inference by Minimizing Deep Coalescence Events. IEEE/ACM Trans Comp Biol Bioinf. 2011;8:1685–1691. (PrePrints) - PubMed

Publication types

LinkOut - more resources