Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 4;103(4):474-483.
doi: 10.1016/j.ajhg.2018.08.005. Epub 2018 Sep 13.

ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants

Affiliations

ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants

Najmeh Alirezaie et al. Am J Hum Genet. .

Abstract

Advances in high-throughput DNA sequencing have revolutionized the discovery of variants in the human genome; however, interpreting the phenotypic effects of those variants is still a challenge. While several computational approaches to predict variant impact are available, their accuracy is limited and further improvement is needed. Here, we introduce ClinPred, an efficient tool for identifying disease-relevant nonsynonymous variants. Our predictor incorporates two machine learning algorithms that use existing pathogenicity scores and, notably, benefits from inclusion of normal population allele frequency from the gnomAD database as an input feature. Another major strength of our approach is the use of ClinVar-a rapidly growing database that allows selection of confidently annotated disease-causing variants-as a training set. Compared to other methods, ClinPred showed superior accuracy for predicting pathogenicity, achieving the highest area under the curve (AUC) score and increasing both the specificity and sensitivity in different test datasets. It also obtained the best performance according to various other metrics. Moreover, ClinPred performance remained robust with respect to disease type (cancer or rare disease) and mechanism (gain or loss of function). Importantly, we observed that adding allele frequency as a predictive feature-as opposed to setting fixed allele frequency cutoffs-boosts the performance of prediction. We provide pre-computed ClinPred scores for all possible human missense variants in the exome to facilitate its use by the community.

Keywords: cancer; computational biology; diagnostic; machine learning; pathogenicity prediction; predictive modeling; rare disease; variant interpretation; whole-exome sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The Performance of ClinPred Was Compared to Seven Recently Developed Tools using ClinVarTest Data (A) ClinPred showed increased sensitivity and specificity compared to other methods (B) Our models had the best specificity at the cut off required to achieve 95% sensitivity. AUC, error percent, and specificity at 95% sensitivity were calculated for 5-fold cross validation and the mean score is shown.
Figure 2
Figure 2
Comparison of Raw Scores of ClinPred, M-CAP, REVEL, and MetaLR Violin plots represent the full distribution of scores for pathogenic (pink) and benign (green) variants in different test data.
Figure 3
Figure 3
Comparison of ClinPred with Categorical Predictions Available from M-CAP, REVEL, and MetaLR REVEL and ClinPred scores lower than 0.5 are defined as tolerant and greater than 0.5 as damaging. We show proportions of benign and pathogenic variants that were classified as tolerated (T, green) and damaging (D, pink). ClinPred had the best performance in finding as many pathogenic variants possible while minimizing the number of benign variants that are predicted as damaging both in ClinVarTest (A) and MouseVariSNP (B).
Figure 4
Figure 4
ClinPred Performance Remained Robust across Distinct Datasets Based on Different Genetic Models and Pathogenic Mechanisms We show mean AUC and error bars for 5-fold cross validation in all test datasets.
Figure 5
Figure 5
Illustration of Performance of ClinPred as Compared to Other Tools on Real-Life Clinical Samples from Solved FORGE Canada and Care4Rare Canada Projects (A) ClinPred reduced the number of nonsynonymous variants predicted as pathogenic and retained high sensitivity. (B) Raw Scores from MetaLR, M-CAP, REVEL, and ClinPred for any causative variant in these 31 solved FORGE Canada and Care4Rare Canada project cases were shown.

Similar articles

Cited by

References

    1. Shihab H.A., Gough J., Mort M., Cooper D.N., Day I.N., Gaunt T.R. Ranking non-synonymous single nucleotide polymorphisms based on disease concepts. Hum. Genomics. 2014;8:11. - PMC - PubMed
    1. Li Q., Liu X., Gibbs R.A., Boerwinkle E., Polychronakos C., Qu H.Q. Gene-specific function prediction for non-synonymous mutations in monogenic diabetes genes. PLoS ONE. 2014;9:e104452. - PMC - PubMed
    1. Ioannidis N.M., Rothstein J.H., Pejaver V., Middha S., McDonnell S.K., Baheti S., Musolf A., Li Q., Holzinger E., Karyadi D. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 2016;99:877–885. - PMC - PubMed
    1. González-Pérez A., López-Bigas N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 2011;88:440–449. - PMC - PubMed
    1. Liu X., Jian X., Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 2011;32:894–899. - PMC - PubMed

Publication types