Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Feb 11:9:37.
doi: 10.1186/1471-2148-9-37.

Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches

Affiliations

Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches

Stephen A Smith et al. BMC Evol Biol. .

Abstract

Background: Biology has increasingly recognized the necessity to build and utilize larger phylogenies to address broad evolutionary questions. Large phylogenies have facilitated the discovery of differential rates of molecular evolution between trees and herbs. They have helped us understand the diversification patterns of mammals as well as the patterns of seed evolution. In addition to these broad evolutionary questions there is increasing awareness of the importance of large phylogenies for addressing conservation issues such as biodiversity hotspots and response to global change. Two major classes of methods have been employed to accomplish the large tree-building task: supertrees and supermatrices. Although these methods are continually being developed, they have yet to be made fully accessible to comparative biologists making extremely large trees rare.

Results: Here we describe and demonstrate a modified supermatrix method termed mega-phylogeny that uses databased sequences as well as taxonomic hierarchies to make extremely large trees with denser matrices than supermatrices. The two major challenges facing large-scale supermatrix phylogenetics are assembling large data matrices from databases and reconstructing trees from those datasets. The mega-phylogeny approach addresses the former as the latter is accomplished by employing recently developed methods that have greatly reduced the run time of large phylogeny construction. We present an algorithm that requires relatively little human intervention. The implemented algorithm is demonstrated with a dataset and phylogeny for Asterales (within Campanulidae) containing 4954 species and 12,033 sites and an rbcL matrix for green plants (Viridiplantae) with 13,533 species and 1,401 sites.

Conclusion: By examining much larger phylogenies, patterns emerge that were otherwise unseen. The phylogeny of Viridiplantae successfully reconstructs major relationships of vascular plants that previously required many more genes. These demonstrations underscore the importance of using large phylogenies to uncover important evolutionary patterns and we present a fast and simple method for constructing these phylogenies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulation exploring the behavior of MAD in relation to alternative measures of dispersion. Each panel is a simulation of sequence data on a balanced phylogeny of 20-(A, C, E, and G) and 100-tips (B, D, F, and H). A and B total tree length scaled to 0.10. C and D total tree length scaled to 0.25. E and F total tree length scaled to 0.50. G and H total tree length scaled to 2.00. Saturation was assessed by descriptors of dispersion on the one-dimensional Euclidean distance between the raw pair-wise sequence distances (uncorrected distance) and those corrected according to a Jukes-Cantor model of molecular substitution (corrected distance). Our simulations demonstrated that the use of the non-parametric median absolute deviation (MAD) had several advantages of detecting saturation over alternative measures of dispersion based on the sample mean (i.e. mean square error, MSE; root mean square, RMSE).
Figure 2
Figure 2
Maximum-likelihood phylogeny for 4954 species of Asterales. The data matrix was constructed using the mega-phylogeny method and includes DNA sequences for five genes: rbcL, matK, trnL-F, ndhF, and ITS. Each of the 12 major families of Asterales is labeled. We also note the placement of the "Doronicum" clade in relation to the tribe Senecioneae; although we assumed a sister relationship a priori, the phylogenetic analysis overruled this assumption, indicating that the two clades may be more distantly related. Pentaphragma, Pentaphragmataceae; Alseu, Alseuosmiaceae; Argo, Argophyllaceae; Phel, Phellinaceae.
Figure 3
Figure 3
Maximum-likelihood phylogeny for 13,533 species of green plants based on rbcL DNA sequences. The data matrix was constructed using the mega-phylogeny method; major clades are labeled and denoted with a star.

Similar articles

Cited by

References

    1. Bininda-Emonds ORP, Cardillo M, Jones KE, MacPhee RDE, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A. The delayed rise of present-day mammals. Nature. 2007;446:507–512. - PubMed
    1. Driskell AC, Ané C, Burleigh JG, McMahon MM, O'meara BC, Sanderson MJ. Prospects for building the tree of life from large sequence databases. Science. 2004;306:1172–1174. - PubMed
    1. Ciccarelli FD, Doerks T, Mering C, Creevey CJ, Snell B, Bork P. Toward automatic reconstruction of a highly resolved tree of life. Science. 2006;311:1283–1287. - PubMed
    1. McMahon MM, Sanderson MJ. Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes. Systematic Biology. 2006;55:818–836. - PubMed
    1. Chase MW, Soltis DE, Olmstead RG, Morgan D, Les DH, Mishler BD, Duvall MR, Price RA, Hills HG, Qui YL, Kron KA, Rettig JH, Conti E, Palmer JD, Manhart JR, Sytsma KJ, Michael HJ, Kress WJ, Karol KG, Clark WD, Hedren M, Gaut BS, Jansen RK, Kim KJ, Wimpee CF, Smith JF, Furnier GR, Strauss SH, Xiang QY, Plunkett GM, Soltis PS, Swensen SM, Williams SE, Gadek PA, Quinn CJ, Eguiarte LE, Golenberg E, Learn GH, Jr, Graham SW, Barrett SCH, Dayanandan S, Albert VA. Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden. 1993;80:528–580.

Publication types