Assessment of protein distance measures and tree-building methods for phylogenetic tree reconstruction
- PMID: 16049194
- DOI: 10.1093/molbev/msi224
Assessment of protein distance measures and tree-building methods for phylogenetic tree reconstruction
Abstract
Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods.
Similar articles
-
On the quality of tree-based protein classification.Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647305
-
Scoredist: a simple and robust protein sequence distance estimator.BMC Bioinformatics. 2005 Apr 27;6:108. doi: 10.1186/1471-2105-6-108. BMC Bioinformatics. 2005. PMID: 15857510 Free PMC article.
-
Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting.Mol Biol Evol. 2004 Mar;21(3):587-98. doi: 10.1093/molbev/msh049. Epub 2003 Dec 23. Mol Biol Evol. 2004. PMID: 14694080
-
Neighbor-joining revealed.Mol Biol Evol. 2006 Nov;23(11):1997-2000. doi: 10.1093/molbev/msl072. Epub 2006 Jul 28. Mol Biol Evol. 2006. PMID: 16877499 Review.
-
Distance measures in terms of substitution processes.Theor Popul Biol. 1999 Apr;55(2):166-75. doi: 10.1006/tpbi.1998.1395. Theor Popul Biol. 1999. PMID: 10329516 Review.
Cited by
-
A pore-forming protein drives macropinocytosis to facilitate toad water maintaining.Commun Biol. 2022 Jul 22;5(1):730. doi: 10.1038/s42003-022-03686-1. Commun Biol. 2022. PMID: 35869260 Free PMC article.
-
A novel method for protein-protein interaction site prediction using phylogenetic substitution models.Proteins. 2012 Jan;80(1):126-41. doi: 10.1002/prot.23169. Epub 2011 Oct 12. Proteins. 2012. PMID: 21989996 Free PMC article.
-
Genomic Diversity and Evolution of Quasispecies in Newcastle Disease Virus Infections.Viruses. 2020 Nov 14;12(11):1305. doi: 10.3390/v12111305. Viruses. 2020. PMID: 33202558 Free PMC article.
-
PhyloBench: A Benchmark for Evaluating Phylogenetic Programs.Mol Biol Evol. 2024 Jun 1;41(6):msae084. doi: 10.1093/molbev/msae084. Mol Biol Evol. 2024. PMID: 38860506 Free PMC article.
-
A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins.Sci Rep. 2023 Nov 20;13(1):20304. doi: 10.1038/s41598-023-47496-9. Sci Rep. 2023. PMID: 37985846 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources