Improving the prediction of disease-related variants using protein three-dimensional structure
- PMID: 21992054
- PMCID: PMC3194195
- DOI: 10.1186/1471-2105-12-S4-S3
Improving the prediction of disease-related variants using protein three-dimensional structure
Abstract
Background: Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability. Non-synonymous SNPs occurring in coding regions result in single amino acid polymorphisms (SAPs) that may affect protein function and lead to pathology. Several methods attempt to estimate the impact of SAPs using different sources of information. Although sequence-based predictors have shown good performance, the quality of these predictions can be further improved by introducing new features derived from three-dimensional protein structures.
Results: In this paper, we present a structure-based machine learning approach for predicting disease-related SAPs. We have trained a Support Vector Machine (SVM) on a set of 3,342 disease-related mutations and 1,644 neutral polymorphisms from 784 protein chains. We use SVM input features derived from the protein's sequence, structure, and function. After dataset balancing, the structure-based method (SVM-3D) reaches an overall accuracy of 85%, a correlation coefficient of 0.70, and an area under the receiving operating characteristic curve (AUC) of 0.92. When compared with a similar sequence-based predictor, SVM-3D results in an increase of the overall accuracy and AUC by 3%, and correlation coefficient by 0.06. The robustness of this improvement has been tested on different datasets and in all the cases SVM-3D performs better than previously developed methods even when compared with PolyPhen2, which explicitly considers in input protein structure information.
Conclusion: This work demonstrates that structural information can increase the accuracy of disease-related SAPs identification. Our results also quantify the magnitude of improvement on a large dataset. This improvement is in agreement with previously observed results, where structure information enhanced the prediction of protein stability changes upon mutation. Although the structural information contained in the Protein Data Bank is limiting the application and the performance of our structure-based method, we expect that SVM-3D will result in higher accuracy when more structural date become available.
Figures






Similar articles
-
WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation.BMC Genomics. 2013;14 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2164-14-S3-S6. Epub 2013 May 28. BMC Genomics. 2013. PMID: 23819482 Free PMC article.
-
A new disease-specific machine learning approach for the prediction of cancer-causing missense variants.Genomics. 2011 Oct;98(4):310-7. doi: 10.1016/j.ygeno.2011.06.010. Epub 2011 Jul 7. Genomics. 2011. PMID: 21763417 Free PMC article.
-
[Application of support vector machine in predicting in-hospital mortality risk of patients with acute kidney injury in ICU].Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):239-244. Beijing Da Xue Xue Bao Yi Xue Ban. 2018. PMID: 29643521 Chinese.
-
Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information.BMC Bioinformatics. 2008 Jun 27;9:297. doi: 10.1186/1471-2105-9-297. BMC Bioinformatics. 2008. PMID: 18588693 Free PMC article.
-
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3. Bioinformatics. 2005. PMID: 15746281
Cited by
-
Computational methods and resources for the interpretation of genomic variants in cancer.BMC Genomics. 2015;16 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2164-16-S8-S7. Epub 2015 Jun 18. BMC Genomics. 2015. PMID: 26111056 Free PMC article. Review.
-
The Loss and Gain of Functional Amino Acid Residues Is a Common Mechanism Causing Human Inherited Disease.PLoS Comput Biol. 2016 Aug 26;12(8):e1005091. doi: 10.1371/journal.pcbi.1005091. eCollection 2016 Aug. PLoS Comput Biol. 2016. PMID: 27564311 Free PMC article.
-
Novel Genetic Markers for Early Detection of Elevated Breast Cancer Risk in Women.Int J Mol Sci. 2019 Sep 28;20(19):4828. doi: 10.3390/ijms20194828. Int J Mol Sci. 2019. PMID: 31569399 Free PMC article.
-
Single nucleotide variations: biological impact and theoretical interpretation.Protein Sci. 2014 Dec;23(12):1650-66. doi: 10.1002/pro.2552. Epub 2014 Oct 20. Protein Sci. 2014. PMID: 25234433 Free PMC article. Review.
-
Functional consequences of somatic mutations in cancer using protein pocket-based prioritization approach.Genome Med. 2014 Oct 14;6(10):81. doi: 10.1186/s13073-014-0081-7. eCollection 2014. Genome Med. 2014. PMID: 25360158 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources