Impact of taxon sampling on the estimation of rates of evolution at sites
- PMID: 15590908
- DOI: 10.1093/molbev/msi065
Impact of taxon sampling on the estimation of rates of evolution at sites
Erratum in
- Mol Biol Evol. 2005 Apr;22(4):1160
Abstract
The function of individual sites within a protein influences their rate of accepted point mutation. During the computation of phylogenetic likelihoods, rate heterogeneity can be modeled on a site-per-site basis with relative rates drawn from a discretized Gamma-distribution. Site-rate estimates (e.g., the rate of highest posterior probability given the data at a site) can then be used as a measure of evolutionary constraints imposed by function. However, if the sequence availability is limited, the estimation of rates is subject to sampling error. This article presents a simulation study that evaluates the robustness of evolutionary site-rate estimates for both small and phylogenetically unbalanced samples. The sampling error on rate estimates was first evaluated for alignments that included 5-45 sequences, sampled by jackknifing, from a master alignment containing 968 sequences. We observed that the potentially enhanced resolution among site rates due to the inclusion of a larger number of rate categories is negated by the difficulty in correctly estimating intermediate rates. This effect is marked for data sets with less than 30 sequences. Although the computation of likelihood theoretically accounts for phylogenetic distances through branch lengths, the introduction of a single long-branch outlier sequence had a significant negative effect on site-rate estimates. Finally, the presence of a shift in rates of evolution between related lineages can be diagnostic of a gain/loss of function within a protein family. Our analyses indicate that detecting these rate shifts is a harder problem than estimating rates. This is so, partially, because the difference in rates depends on two rate estimates, each with an intrinsic uncertainty. The performances of four methods to detect these site-rate shifts are evaluated and compared. Guidelines are suggested for preparing data sets minimally influenced by error introduced by sequence sampling.
Similar articles
-
A gamma mixture model better accounts for among site rate heterogeneity.Bioinformatics. 2005 Sep 1;21 Suppl 2:ii151-8. doi: 10.1093/bioinformatics/bti1125. Bioinformatics. 2005. PMID: 16204095
-
A molecular timescale for galliform birds accounting for uncertainty in time estimates and heterogeneity of rates of DNA substitutions across lineages and sites.Mol Phylogenet Evol. 2006 Feb;38(2):499-509. doi: 10.1016/j.ympev.2005.07.007. Epub 2005 Aug 19. Mol Phylogenet Evol. 2006. PMID: 16112881
-
On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled.Mol Biol Evol. 2004 Sep;21(9):1629-42. doi: 10.1093/molbev/msh159. Epub 2004 May 21. Mol Biol Evol. 2004. PMID: 15155796
-
Distance measures in terms of substitution processes.Theor Popul Biol. 1999 Apr;55(2):166-75. doi: 10.1006/tpbi.1998.1395. Theor Popul Biol. 1999. PMID: 10329516 Review.
-
Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work.Protein Sci. 2021 Oct;30(10):2009-2028. doi: 10.1002/pro.4161. Epub 2021 Aug 12. Protein Sci. 2021. PMID: 34322924 Free PMC article. Review.
Cited by
-
Framing the Salmonidae family phylogenetic portrait: a more complete picture from increased taxon sampling.PLoS One. 2012;7(10):e46662. doi: 10.1371/journal.pone.0046662. Epub 2012 Oct 5. PLoS One. 2012. PMID: 23071608 Free PMC article.
-
The nature of protein domain evolution: shaping the interaction network.Curr Genomics. 2010 Aug;11(5):368-76. doi: 10.2174/138920210791616725. Curr Genomics. 2010. PMID: 21286315 Free PMC article.
-
Are rates of molecular evolution in mammals substantially accelerated in warmer environments?Proc Biol Sci. 2011 May 7;278(1710):1291-3; discussion 1294-7. doi: 10.1098/rspb.2010.0388. Epub 2011 Feb 2. Proc Biol Sci. 2011. PMID: 21288954 Free PMC article. No abstract available.
-
libcov: a C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny.BMC Bioinformatics. 2005 Jun 6;6:138. doi: 10.1186/1471-2105-6-138. BMC Bioinformatics. 2005. PMID: 15938750 Free PMC article.
-
Linking fold, function and phylogeny: a comparative genomics view on protein (domain) evolution.Curr Genomics. 2008 Apr;9(2):88-96. doi: 10.2174/138920208784139537. Curr Genomics. 2008. PMID: 19440449 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources