. 2010 Jul 30;5(7):e11900.

doi: 10.1371/journal.pone.0011900.

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Tao Huang¹, Ping Wang, Zhi-Qiang Ye, Heng Xu, Zhisong He, Kai-Yan Feng, Lele Hu, Weiren Cui, Kai Wang, Xiao Dong, Lu Xie, Xiangyin Kong, Yu-Dong Cai, Yixue Li

Affiliations

PMID: 20689580
PMCID: PMC2912763
DOI: 10.1371/journal.pone.0011900

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Tao Huang et al. PLoS One. 2010.

. 2010 Jul 30;5(7):e11900.

doi: 10.1371/journal.pone.0011900.

Authors

Tao Huang¹, Ping Wang, Zhi-Qiang Ye, Heng Xu, Zhisong He, Kai-Yan Feng, Lele Hu, Weiren Cui, Kai Wang, Xiao Dong, Lu Xie, Xiangyin Kong, Yu-Dong Cai, Yixue Li

Affiliation

¹ Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China.

PMID: 20689580
PMCID: PMC2912763
DOI: 10.1371/journal.pone.0011900

Abstract

Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. The curve of IFS.**
(A) The IFS curve with a step width of 5. The highest accuracy was achieved with 261 features, which suggest the optimal feature set should have more than 256 and less than 266 features; (B) The IFS curve between index 256 and 265. Refine the accuracy around *S₂₆₁*, by calculating accuracies using feature sets *S₂₅₆*, *S₂₅₇… S₂₆₅*. The highest accuracy of IFS was 83.27% using 263 features. These 263 features formed the optimal feature set.

**Figure 2. The number of each type of features in the optimal feature set.**
The feature with the biggest contribution is KEGG enrichment scores, one kind of the network features. Another kind of the network features, betweenness, was also important. This suggests that if a protein does not interact with biologically important proteins, then its mutation may not cause severe damage.

**Figure 3. The number of each type of AAFactor features in the optimal feature set.**
Factor 3 is the most important one and it relates to molecular size or volume with high factor coefficients for bulkiness, residue volume, average volume of a buried residue, side chain volume, and molecular weight.

See this image and copyright information in PMC

Cited by

Improved classification of lung cancer tumors based on structural and physicochemical properties of proteins using data mining models.
Ramani RG, Jacob SG. Ramani RG, et al. PLoS One. 2013;8(3):e58772. doi: 10.1371/journal.pone.0058772. Epub 2013 Mar 7. PLoS One. 2013. PMID: 23505559 Free PMC article.
Single nucleotide polymorphisms in microRNA binding sites of oncogenes: implications in cancer and pharmacogenomics.
Manikandan M, Munirajan AK. Manikandan M, et al. OMICS. 2014 Feb;18(2):142-54. doi: 10.1089/omi.2013.0098. Epub 2013 Nov 28. OMICS. 2014. PMID: 24286505 Free PMC article.
Systems pharmacology: network analysis to identify multiscale mechanisms of drug action.
Zhao S, Iyengar R. Zhao S, et al. Annu Rev Pharmacol Toxicol. 2012;52:505-21. doi: 10.1146/annurev-pharmtox-010611-134520. Annu Rev Pharmacol Toxicol. 2012. PMID: 22235860 Free PMC article. Review.
SySAP: a system-level predictor of deleterious single amino acid polymorphisms.
Huang T, Wang C, Zhang G, Xie L, Li Y. Huang T, et al. Protein Cell. 2012 Jan;3(1):38-43. doi: 10.1007/s13238-011-1130-2. Epub 2011 Dec 19. Protein Cell. 2012. PMID: 22183811 Free PMC article.
IDRMutPred: predicting disease-associated germline nonsynonymous single nucleotide variants (nsSNVs) in intrinsically disordered regions.
Zhou JB, Xiong Y, An K, Ye ZQ, Wu YD. Zhou JB, et al. Bioinformatics. 2020 Dec 22;36(20):4977-4983. doi: 10.1093/bioinformatics/btaa618. Bioinformatics. 2020. PMID: 32756939 Free PMC article.

See all "Cited by" articles

References

1. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. - PMC - PubMed
1. Collins FS, Brooks LD, Chakravarti A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 1998;8:1229–1231. - PubMed
1. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, et al. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat. 2003;21:577–581. - PubMed
1. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33:D514–517. - PMC - PubMed
1. Ruepp A, Doudieu ON, van den Oever J, Brauner B, Dunger-Kaltenbach I, et al. The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context. Nucleic Acids Res. 2006;34:D568–571. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Affiliation

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous