ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants
- PMID: 30220433
- PMCID: PMC6174354
- DOI: 10.1016/j.ajhg.2018.08.005
ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants
Abstract
Advances in high-throughput DNA sequencing have revolutionized the discovery of variants in the human genome; however, interpreting the phenotypic effects of those variants is still a challenge. While several computational approaches to predict variant impact are available, their accuracy is limited and further improvement is needed. Here, we introduce ClinPred, an efficient tool for identifying disease-relevant nonsynonymous variants. Our predictor incorporates two machine learning algorithms that use existing pathogenicity scores and, notably, benefits from inclusion of normal population allele frequency from the gnomAD database as an input feature. Another major strength of our approach is the use of ClinVar-a rapidly growing database that allows selection of confidently annotated disease-causing variants-as a training set. Compared to other methods, ClinPred showed superior accuracy for predicting pathogenicity, achieving the highest area under the curve (AUC) score and increasing both the specificity and sensitivity in different test datasets. It also obtained the best performance according to various other metrics. Moreover, ClinPred performance remained robust with respect to disease type (cancer or rare disease) and mechanism (gain or loss of function). Importantly, we observed that adding allele frequency as a predictive feature-as opposed to setting fixed allele frequency cutoffs-boosts the performance of prediction. We provide pre-computed ClinPred scores for all possible human missense variants in the exome to facilitate its use by the community.
Keywords: cancer; computational biology; diagnostic; machine learning; pathogenicity prediction; predictive modeling; rare disease; variant interpretation; whole-exome sequencing.
Copyright © 2018 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Figures





Similar articles
-
CAPICE: a computational method for Consequence-Agnostic Pathogenicity Interpretation of Clinical Exome variations.Genome Med. 2020 Aug 24;12(1):75. doi: 10.1186/s13073-020-00775-w. Genome Med. 2020. PMID: 32831124 Free PMC article.
-
Machine learning in schizophrenia genomics, a case-control study using 5,090 exomes.Am J Med Genet B Neuropsychiatr Genet. 2019 Mar;180(2):103-112. doi: 10.1002/ajmg.b.32638. Epub 2018 Apr 28. Am J Med Genet B Neuropsychiatr Genet. 2019. PMID: 29704323
-
REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.Am J Hum Genet. 2016 Oct 6;99(4):877-885. doi: 10.1016/j.ajhg.2016.08.016. Epub 2016 Sep 22. Am J Hum Genet. 2016. PMID: 27666373 Free PMC article.
-
Deep learning of genomic variation and regulatory network data.Hum Mol Genet. 2018 May 1;27(R1):R63-R71. doi: 10.1093/hmg/ddy115. Hum Mol Genet. 2018. PMID: 29648622 Free PMC article. Review.
-
Biomedical informatics and machine learning for clinical genomics.Hum Mol Genet. 2018 May 1;27(R1):R29-R34. doi: 10.1093/hmg/ddy088. Hum Mol Genet. 2018. PMID: 29566172 Free PMC article. Review.
Cited by
-
Genome-wide prediction of pathogenic gain- and loss-of-function variants from ensemble learning of a diverse feature set.Genome Med. 2023 Nov 30;15(1):103. doi: 10.1186/s13073-023-01261-9. Genome Med. 2023. PMID: 38037155 Free PMC article.
-
An expanded phenotype centric benchmark of variant prioritisation tools.Hum Mutat. 2022 May;43(5):539-546. doi: 10.1002/humu.24362. Epub 2022 Mar 9. Hum Mutat. 2022. PMID: 35224813 Free PMC article.
-
Computational Resources for the Interpretation of Variations in Cancer.Adv Exp Med Biol. 2022;1361:177-198. doi: 10.1007/978-3-030-91836-1_10. Adv Exp Med Biol. 2022. PMID: 35230689
-
Genome interpretation using in silico predictors of variant impact.Hum Genet. 2022 Oct;141(10):1549-1577. doi: 10.1007/s00439-022-02457-6. Epub 2022 Apr 30. Hum Genet. 2022. PMID: 35488922 Free PMC article. Review.
-
A Novel GEMIN4 Variant in a Consanguineous Family Leads to Neurodevelopmental Impairment with Severe Microcephaly, Spastic Quadriplegia, Epilepsy, and Cataracts.Genes (Basel). 2021 Dec 30;13(1):92. doi: 10.3390/genes13010092. Genes (Basel). 2021. PMID: 35052432 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases