Accuracy and power of bayes prediction of amino acid sites under positive selection
- PMID: 12032251
- DOI: 10.1093/oxfordjournals.molbev.a004152
Accuracy and power of bayes prediction of amino acid sites under positive selection
Abstract
Bayes prediction quantifies uncertainty by assigning posterior probabilities. It was used to identify amino acids in a protein under recurrent diversifying selection indicated by higher nonsynonymous (d(N)) than synonymous (d(S)) substitution rates or by omega = d(N)/d(S) > 1. Parameters were estimated by maximum likelihood under a codon substitution model that assumed several classes of sites with different omega ratios. The Bayes theorem was used to calculate the posterior probabilities of each site falling into these site classes. Here, we evaluate the performance of Bayes prediction of amino acids under positive selection by computer simulation. We measured the accuracy by the proportion of predicted sites that were truly under selection and the power by the proportion of true positively selected sites that were predicted by the method. The accuracy was slightly better for longer sequences, whereas the power was largely unaffected by the increase in sequence length. Both accuracy and power were higher for medium or highly diverged sequences than for similar sequences. We found that accuracy and power were unacceptably low when data contained only a few highly similar sequences. However, sampling a large number of lineages improved the performance substantially. Even for very similar sequences, accuracy and power can be high if over 100 taxa are used in the analysis. We make the following recommendations: (1) prediction of positive selection sites is not feasible for a few closely related sequences; (2) using a large number of lineages is the best way to improve the accuracy and power of the prediction; and (3) multiple models of heterogeneous selective pressures among sites should be applied in real data analysis.
Similar articles
-
Bayes empirical bayes inference of amino acid sites under positive selection.Mol Biol Evol. 2005 Apr;22(4):1107-18. doi: 10.1093/molbev/msi097. Epub 2005 Feb 2. Mol Biol Evol. 2005. PMID: 15689528
-
Codon-substitution models for heterogeneous selection pressure at amino acid sites.Genetics. 2000 May;155(1):431-49. doi: 10.1093/genetics/155.1.431. Genetics. 2000. PMID: 10790415 Free PMC article.
-
Identifying sites under positive selection with uncertain parameter estimates.Genome. 2006 Jul;49(7):767-76. doi: 10.1139/g06-038. Genome. 2006. PMID: 16936785
-
Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes.Mol Biol Evol. 2002 Jan;19(1):49-57. doi: 10.1093/oxfordjournals.molbev.a003981. Mol Biol Evol. 2002. PMID: 11752189
-
Smoothed Bootstrap Aggregation for Assessing Selection Pressure at Amino Acid Sites.Mol Biol Evol. 2016 Nov;33(11):2976-2989. doi: 10.1093/molbev/msw160. Epub 2016 Aug 2. Mol Biol Evol. 2016. PMID: 27486222
Cited by
-
DGINN, an automated and highly-flexible pipeline for the detection of genetic innovations on protein-coding genes.Nucleic Acids Res. 2020 Oct 9;48(18):e103. doi: 10.1093/nar/gkaa680. Nucleic Acids Res. 2020. PMID: 32941639 Free PMC article.
-
Mammalian NPC1 genes may undergo positive selection and human polymorphisms associate with type 2 diabetes.BMC Med. 2012 Nov 15;10:140. doi: 10.1186/1741-7015-10-140. BMC Med. 2012. PMID: 23153210 Free PMC article.
-
Genus-Wide Comparative Genome Analyses of Colletotrichum Species Reveal Specific Gene Family Losses and Gains during Adaptation to Specific Infection Lifestyles.Genome Biol Evol. 2016 May 22;8(5):1467-81. doi: 10.1093/gbe/evw089. Genome Biol Evol. 2016. PMID: 27189990 Free PMC article.
-
Positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins.BMC Evol Biol. 2010 Feb 11;10:39. doi: 10.1186/1471-2148-10-39. BMC Evol Biol. 2010. PMID: 20149245 Free PMC article.
-
Functional conservation and divergence in plant-specific GRF gene family revealed by sequences and expression analysis.Open Life Sci. 2022 Mar 11;17(1):155-171. doi: 10.1515/biol-2022-0018. eCollection 2022. Open Life Sci. 2022. PMID: 35350448 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources