Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information
- PMID: 15746281
- DOI: 10.1093/bioinformatics/bti365
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information
Abstract
Motivation: There has been great expectation that the knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Non-synonymous single nucleotide polymorphisms (nsSNPs) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited diseases. To facilitate the identification of disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the phenotypic effects of nsSNPs.
Results: We prepared a training set based on the variant phenotypic annotation of the Swiss-Prot database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm, which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (with not <10 homologous sequences), the performance of our method is comparable with the SIFT algorithm, while for nsSNPs with insufficient evolutionary information (<10 homologous sequences), our method outperforms the SIFT algorithm significantly. These findings indicate that incorporating structural information is critical to achieving good prediction accuracy when sufficient evolutionary information is not available.
Availability: The codes and curated dataset are available at http://compbio.utmem.edu/snp/dataset/
Similar articles
-
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838. Proteins. 2008. PMID: 18186470
-
Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.Bioinformatics. 2006 Nov 15;22(22):2729-34. doi: 10.1093/bioinformatics/btl423. Epub 2006 Aug 7. Bioinformatics. 2006. PMID: 16895930
-
LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.Bioinformatics. 2005 Jun 15;21(12):2814-20. doi: 10.1093/bioinformatics/bti442. Epub 2005 Apr 12. Bioinformatics. 2005. PMID: 15827081
-
Computational prediction of the effects of non-synonymous single nucleotide polymorphisms in human DNA repair genes.Neuroscience. 2007 Apr 14;145(4):1273-9. doi: 10.1016/j.neuroscience.2006.09.004. Epub 2006 Oct 19. Neuroscience. 2007. PMID: 17055652 Review.
-
Approaches and resources for prediction of the effects of non-synonymous single nucleotide polymorphism on protein function and interactions.Curr Pharm Biotechnol. 2008 Apr;9(2):123-33. doi: 10.2174/138920108783955164. Curr Pharm Biotechnol. 2008. PMID: 18393868 Review.
Cited by
-
Functional and Structural Impact of Deleterious Missense Single Nucleotide Polymorphisms in the NR3C1, CYP3A5, and TNF-α Genes: An In Silico Analysis.Biomolecules. 2022 Sep 16;12(9):1307. doi: 10.3390/biom12091307. Biomolecules. 2022. PMID: 36139147 Free PMC article.
-
nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms.Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W480-2. doi: 10.1093/nar/gki372. Nucleic Acids Res. 2005. PMID: 15980516 Free PMC article.
-
MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing.Genome Biol. 2014 Jan 13;15(1):R19. doi: 10.1186/gb-2014-15-1-r19. Genome Biol. 2014. PMID: 24451234 Free PMC article.
-
LYRUS: a machine learning model for predicting the pathogenicity of missense variants.Bioinform Adv. 2021 Dec 25;2(1):vbab045. doi: 10.1093/bioadv/vbab045. eCollection 2022. Bioinform Adv. 2021. PMID: 35036922 Free PMC article.
-
SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features.J Mol Biol. 2014 Jul 15;426(14):2692-701. doi: 10.1016/j.jmb.2014.04.026. Epub 2014 May 5. J Mol Biol. 2014. PMID: 24810707 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources