ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants

Najmeh Alirezaie¹, Kristin D Kernohan², Taila Hartley², Jacek Majewski³, Toby Dylan Hocking⁴

Affiliations

¹ Department of Human Genetics, McGill University, Montreal, QC H3A 0G1, Canada. Electronic address: najmeh.alirezaie@mail.mcgill.ca.
² Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON K1H 5B2, Canada.
³ Department of Human Genetics, McGill University, Montreal, QC H3A 0G1, Canada. Electronic address: jacek.majewski@mcgill.ca.
⁴ Department of Human Genetics, McGill University, Montreal, QC H3A 0G1, Canada.

PMID: 30220433
PMCID: PMC6174354
DOI: 10.1016/j.ajhg.2018.08.005

ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants

Najmeh Alirezaie et al. Am J Hum Genet. 2018.

. 2018 Oct 4;103(4):474-483.

doi: 10.1016/j.ajhg.2018.08.005. Epub 2018 Sep 13.

Authors

Najmeh Alirezaie¹, Kristin D Kernohan², Taila Hartley², Jacek Majewski³, Toby Dylan Hocking⁴

Affiliations

¹ Department of Human Genetics, McGill University, Montreal, QC H3A 0G1, Canada. Electronic address: najmeh.alirezaie@mail.mcgill.ca.
² Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON K1H 5B2, Canada.
³ Department of Human Genetics, McGill University, Montreal, QC H3A 0G1, Canada. Electronic address: jacek.majewski@mcgill.ca.
⁴ Department of Human Genetics, McGill University, Montreal, QC H3A 0G1, Canada.

PMID: 30220433
PMCID: PMC6174354
DOI: 10.1016/j.ajhg.2018.08.005

Abstract

Advances in high-throughput DNA sequencing have revolutionized the discovery of variants in the human genome; however, interpreting the phenotypic effects of those variants is still a challenge. While several computational approaches to predict variant impact are available, their accuracy is limited and further improvement is needed. Here, we introduce ClinPred, an efficient tool for identifying disease-relevant nonsynonymous variants. Our predictor incorporates two machine learning algorithms that use existing pathogenicity scores and, notably, benefits from inclusion of normal population allele frequency from the gnomAD database as an input feature. Another major strength of our approach is the use of ClinVar-a rapidly growing database that allows selection of confidently annotated disease-causing variants-as a training set. Compared to other methods, ClinPred showed superior accuracy for predicting pathogenicity, achieving the highest area under the curve (AUC) score and increasing both the specificity and sensitivity in different test datasets. It also obtained the best performance according to various other metrics. Moreover, ClinPred performance remained robust with respect to disease type (cancer or rare disease) and mechanism (gain or loss of function). Importantly, we observed that adding allele frequency as a predictive feature-as opposed to setting fixed allele frequency cutoffs-boosts the performance of prediction. We provide pre-computed ClinPred scores for all possible human missense variants in the exome to facilitate its use by the community.

Keywords: cancer; computational biology; diagnostic; machine learning; pathogenicity prediction; predictive modeling; rare disease; variant interpretation; whole-exome sequencing.

PubMed Disclaimer

Figures

**Figure 1**
The Performance of ClinPred Was Compared to Seven Recently Developed Tools using ClinVarTest Data (A) ClinPred showed increased sensitivity and specificity compared to other methods (B) Our models had the best specificity at the cut off required to achieve 95% sensitivity. AUC, error percent, and specificity at 95% sensitivity were calculated for 5-fold cross validation and the mean score is shown.

**Figure 2**
Comparison of Raw Scores of ClinPred, M-CAP, REVEL, and MetaLR Violin plots represent the full distribution of scores for pathogenic (pink) and benign (green) variants in different test data.

**Figure 3**
Comparison of ClinPred with Categorical Predictions Available from M-CAP, REVEL, and MetaLR REVEL and ClinPred scores lower than 0.5 are defined as tolerant and greater than 0.5 as damaging. We show proportions of benign and pathogenic variants that were classified as tolerated (T, green) and damaging (D, pink). ClinPred had the best performance in finding as many pathogenic variants possible while minimizing the number of benign variants that are predicted as damaging both in ClinVarTest (A) and MouseVariSNP (B).

**Figure 4**
ClinPred Performance Remained Robust across Distinct Datasets Based on Different Genetic Models and Pathogenic Mechanisms We show mean AUC and error bars for 5-fold cross validation in all test datasets.

**Figure 5**
Illustration of Performance of ClinPred as Compared to Other Tools on Real-Life Clinical Samples from Solved FORGE Canada and Care4Rare Canada Projects (A) ClinPred reduced the number of nonsynonymous variants predicted as pathogenic and retained high sensitivity. (B) Raw Scores from MetaLR, M-CAP, REVEL, and ClinPred for any causative variant in these 31 solved FORGE Canada and Care4Rare Canada project cases were shown.

See this image and copyright information in PMC

References

1. Shihab H.A., Gough J., Mort M., Cooper D.N., Day I.N., Gaunt T.R. Ranking non-synonymous single nucleotide polymorphisms based on disease concepts. Hum. Genomics. 2014;8:11. - PMC - PubMed
1. Li Q., Liu X., Gibbs R.A., Boerwinkle E., Polychronakos C., Qu H.Q. Gene-specific function prediction for non-synonymous mutations in monogenic diabetes genes. PLoS ONE. 2014;9:e104452. - PMC - PubMed
1. Ioannidis N.M., Rothstein J.H., Pejaver V., Middha S., McDonnell S.K., Baheti S., Musolf A., Li Q., Holzinger E., Karyadi D. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 2016;99:877–885. - PMC - PubMed
1. González-Pérez A., López-Bigas N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 2011;88:440–449. - PMC - PubMed
1. Liu X., Jian X., Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 2011;32:894–899. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants

Affiliations

ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases