Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep;38(9):1064-1071.
doi: 10.1002/humu.23179. Epub 2017 May 2.

Blind prediction of deleterious amino acid variations with SNPs&GO

Affiliations

Blind prediction of deleterious amino acid variations with SNPs&GO

Emidio Capriotti et al. Hum Mutat. 2017 Sep.

Abstract

SNPs&GO is a machine learning method for predicting the association of single amino acid variations (SAVs) to disease, considering protein functional annotation. The method is a binary classifier that implements a support vector machine algorithm to discriminate between disease-related and neutral SAVs. SNPs&GO combines information from protein sequence with functional annotation encoded by gene ontology (GO) terms. Tested in sequence mode on more than 38,000 SAVs from the SwissVar dataset, our method reached 81% overall accuracy and an area under the receiving operating characteristic curve of 0.88 with low false-positive rate. In almost all the editions of the Critical Assessment of Genome Interpretation (CAGI) experiments, SNPs&GO ranked among the most accurate algorithms for predicting the effect of SAVs. In this paper, we summarize the best results obtained by SNPs&GO on disease-related variations of four CAGI challenges relative to the following genes: CHEK2 (CAGI 2010), RAD50 (CAGI 2011), p16-INK (CAGI 2013), and NAGLU (CAGI 2016). Result evaluation provides insights about the accuracy of our algorithm and the relevance of GO terms in annotating the effect of the variants. It also helps to define good practices for the detection of deleterious SAVs.

Keywords: disease-related variation; gene ontology; genome interpretation; machine learning; protein function; single amino acid variation; variant annotation.

PubMed Disclaimer

Conflict of interest statement

DISCLOSURE STATEMENT

The authors declare that they have no conflict of interests.

Figures

Figure 1
Figure 1
Comparison between predicted and experimental Relative Proliferation (RelPro) rates for the p16 challenge. Linear regression for SPARK-LAB (A), SNPs&GO13 (B) and Dr.Cancer (C) predictions. r and r° are the Pearson’s correlation coefficients with and without the amino acid variation p.Gly23Ala respectively.
Figure 2
Figure 2
Comparison between the binary classification performance of SNPs&GO13 (black) and MutPred2* (gray) on the NAGLU dataset.

References

    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. - PMC - PubMed
    1. Bromberg Y, Capriotti E, Carter H. VarI-SIG 2015: methods for personalized medicine - the role of variant interpretation in research and diagnostics. BMC Genomics. 2016;17(Suppl 2):425. - PMC - PubMed
    1. Brownstein CA, Beggs AH, Homer N, Merriman B, Yu TW, Flannery KC, DeChene ET, Towne MC, Savage SK, Price EN, et al. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol. 2014;15(3):R53. - PMC - PubMed
    1. Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat. 2009;30(8):1237–44. - PubMed
    1. Cancer Genome Atlas Research N. Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45(10):1113–20. - PMC - PubMed

MeSH terms