FastTree: computing large minimum evolution trees with profiles instead of a distance matrix
- PMID: 19377059
- PMCID: PMC2693737
- DOI: 10.1093/molbev/msp077
FastTree: computing large minimum evolution trees with profiles instead of a distance matrix
Abstract
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N(2)) space and O(N(2)L) time, but FastTree requires just O(NLa + N ) memory and O(N log (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
Figures


Similar articles
-
FastTree 2--approximately maximum-likelihood trees for large alignments.PLoS One. 2010 Mar 10;5(3):e9490. doi: 10.1371/journal.pone.0009490. PLoS One. 2010. PMID: 20224823 Free PMC article.
-
RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation.PLoS One. 2011;6(11):e27731. doi: 10.1371/journal.pone.0027731. Epub 2011 Nov 21. PLoS One. 2011. PMID: 22132132 Free PMC article.
-
Assessment of protein distance measures and tree-building methods for phylogenetic tree reconstruction.Mol Biol Evol. 2005 Nov;22(11):2257-64. doi: 10.1093/molbev/msi224. Epub 2005 Jul 27. Mol Biol Evol. 2005. PMID: 16049194
-
Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences.BMC Bioinformatics. 2006 Jul 19;7:350. doi: 10.1186/1471-2105-7-350. BMC Bioinformatics. 2006. PMID: 16854218 Free PMC article.
-
Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used.Mol Biol Evol. 2000 Aug;17(8):1251-8. doi: 10.1093/oxfordjournals.molbev.a026408. Mol Biol Evol. 2000. PMID: 10908645
Cited by
-
A widespread phage-encoded kinase enables evasion of multiple host antiphage defence systems.Nat Microbiol. 2024 Dec;9(12):3226-3239. doi: 10.1038/s41564-024-01851-2. Epub 2024 Nov 6. Nat Microbiol. 2024. PMID: 39506096
-
Eating eggplants as a cucurbit feeder: Dietary shifts affect the gut microbiome of the melon fly Zeugodacus cucurbitae (Diptera, Tephritidae).Microbiologyopen. 2022 Aug;11(4):e1307. doi: 10.1002/mbo3.1307. Microbiologyopen. 2022. PMID: 36031958 Free PMC article.
-
Lethal Outcome of Leptospirosis in Southern Russia: Characterization of Leptospira Interrogans Isolated from a Deсeased Teenager.Int J Environ Res Public Health. 2020 Jun 14;17(12):4238. doi: 10.3390/ijerph17124238. Int J Environ Res Public Health. 2020. PMID: 32545855 Free PMC article.
-
Genome-Wide Association Study Identifies Rice Panicle Blast-Resistant Gene Pb4 Encoding a Wall-Associated Kinase.Int J Mol Sci. 2024 Jan 9;25(2):830. doi: 10.3390/ijms25020830. Int J Mol Sci. 2024. PMID: 38255904 Free PMC article.
-
Changes in soil fungal communities after onset of wheat yellow mosaic virus disease.Front Bioeng Biotechnol. 2022 Oct 17;10:1033991. doi: 10.3389/fbioe.2022.1033991. eCollection 2022. Front Bioeng Biotechnol. 2022. PMID: 36324899 Free PMC article.
References
-
- Bininda-Emonds OR, Brady SG, Kim J, Sanderson MJ. Scaling of accuracy in extremely large phylogenetic trees. Pac Symp Biocomput. 2001;2001:547–558. - PubMed
-
- DeLong ER, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1998;44:837–845. - PubMed
-
- Desper R, Gascuel O. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J Comput Biol. 2002;9:687–705. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical