Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior
- PMID: 15201400
- DOI: 10.1093/molbev/msh194
Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior
Abstract
The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of the protein. When using probabilistic methods for site-specific rate inference, few alternatives are possible. In this study we use simulations to compare the maximum-likelihood and Bayesian paradigms. We study the dependence of inference accuracy on such parameters as number of sequences, branch lengths, the shape of the rate distribution, and sequence length. We also study the possibility of simultaneously estimating branch lengths and site-specific rates. Our results show that a Bayesian approach is superior to maximum-likelihood under a wide range of conditions, indicating that the prior that is incorporated into the Bayesian computation significantly improves performance. We show that when branch lengths are unknown, it is better first to estimate branch lengths and then to estimate site-specific rates. This procedure was found to be superior to estimating both the branch lengths and site-specific rates simultaneously. Finally, we illustrate the difference between maximum-likelihood and Bayesian methods when analyzing site-conservation for the apoptosis regulator protein Bcl-x(L).
Similar articles
-
A Bayesian model comparison approach to inferring positive selection.Mol Biol Evol. 2005 Dec;22(12):2531-40. doi: 10.1093/molbev/msi250. Epub 2005 Aug 24. Mol Biol Evol. 2005. PMID: 16120799
-
Effects of branch length uncertainty on Bayesian posterior probabilities for phylogenetic hypotheses.Mol Biol Evol. 2007 Sep;24(9):2108-18. doi: 10.1093/molbev/msm141. Epub 2007 Jul 17. Mol Biol Evol. 2007. PMID: 17636043
-
Accuracy of rate estimation using relaxed-clock models with a critical focus on the early metazoan radiation.Mol Biol Evol. 2005 May;22(5):1355-63. doi: 10.1093/molbev/msi125. Epub 2005 Mar 9. Mol Biol Evol. 2005. PMID: 15758207
-
Impact of taxon sampling on the estimation of rates of evolution at sites.Mol Biol Evol. 2005 Mar;22(3):784-91. doi: 10.1093/molbev/msi065. Epub 2004 Dec 8. Mol Biol Evol. 2005. PMID: 15590908
-
Using models of nucleotide evolution to build phylogenetic trees.Dev Comp Immunol. 2005;29(3):211-27. doi: 10.1016/j.dci.2004.07.007. Dev Comp Immunol. 2005. PMID: 15572070 Review.
Cited by
-
Structural plasticity enables evolution and innovation of RuBisCO assemblies.Sci Adv. 2022 Aug 26;8(34):eadc9440. doi: 10.1126/sciadv.adc9440. Epub 2022 Aug 26. Sci Adv. 2022. PMID: 36026446 Free PMC article.
-
Extent of structural asymmetry in homodimeric proteins: prevalence and relevance.PLoS One. 2012;7(5):e36688. doi: 10.1371/journal.pone.0036688. Epub 2012 May 22. PLoS One. 2012. PMID: 22629324 Free PMC article.
-
Prediction of functional phosphorylation sites by incorporating evolutionary information.Protein Cell. 2012 Sep;3(9):675-90. doi: 10.1007/s13238-012-2048-z. Epub 2012 Jul 16. Protein Cell. 2012. PMID: 22802047 Free PMC article.
-
L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier.PLoS One. 2012;7(4):e35666. doi: 10.1371/journal.pone.0035666. Epub 2012 Apr 27. PLoS One. 2012. PMID: 22558194 Free PMC article.
-
Predicting where small molecules bind at protein-protein interfaces.PLoS One. 2013;8(3):e58583. doi: 10.1371/journal.pone.0058583. Epub 2013 Mar 7. PLoS One. 2013. PMID: 23505538 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials