Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 30;5(7):e11900.
doi: 10.1371/journal.pone.0011900.

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Affiliations

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Tao Huang et al. PLoS One. .

Abstract

Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The curve of IFS.
(A) The IFS curve with a step width of 5. The highest accuracy was achieved with 261 features, which suggest the optimal feature set should have more than 256 and less than 266 features; (B) The IFS curve between index 256 and 265. Refine the accuracy around S261, by calculating accuracies using feature sets S256, S257… S265. The highest accuracy of IFS was 83.27% using 263 features. These 263 features formed the optimal feature set.
Figure 2
Figure 2. The number of each type of features in the optimal feature set.
The feature with the biggest contribution is KEGG enrichment scores, one kind of the network features. Another kind of the network features, betweenness, was also important. This suggests that if a protein does not interact with biologically important proteins, then its mutation may not cause severe damage.
Figure 3
Figure 3. The number of each type of AAFactor features in the optimal feature set.
Factor 3 is the most important one and it relates to molecular size or volume with high factor coefficients for bulkiness, residue volume, average volume of a buried residue, side chain volume, and molecular weight.

Similar articles

Cited by

References

    1. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. - PMC - PubMed
    1. Collins FS, Brooks LD, Chakravarti A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 1998;8:1229–1231. - PubMed
    1. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, et al. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat. 2003;21:577–581. - PubMed
    1. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33:D514–517. - PMC - PubMed
    1. Ruepp A, Doudieu ON, van den Oever J, Brauner B, Dunger-Kaltenbach I, et al. The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context. Nucleic Acids Res. 2006;34:D568–571. - PMC - PubMed

Publication types