Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar;144(2-3):281-293.
doi: 10.1007/s00439-025-02732-2. Epub 2025 Mar 21.

Critical assessment of missense variant effect predictors on disease-relevant variant data

Affiliations

Critical assessment of missense variant effect predictors on disease-relevant variant data

Ruchir Rastogi et al. Hum Genet. 2025 Mar.

Abstract

Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) challenge, in which missense variant effect predictors (also called variant impact predictors) are evaluated on missense variants added to disease-relevant databases following the prediction submission deadline. Here we assess predictors submitted to the CAGI 6 Annotate-All-Missense challenge, predictors commonly used in clinical genetics, and recently developed deep learning methods. We examine performance across a range of settings relevant for clinical and research applications, focusing on different subsets of the evaluation data as well as high-specificity and high-sensitivity regimes. Our evaluations reveal notable advances in current methods relative to older, well-cited tools in the field. While meta-predictors tend to outperform their constituent individual predictors, several newer individual predictors perform comparably to commonly used meta-predictors. Predictor performance varies between high-specificity and high-sensitivity regimes, highlighting that different methods may be optimal for different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors trained on pathogenicity labels from curated variant databases often inherit gene-level label imbalances. Our findings help illuminate the clinical and research utility of modern missense variant effect predictors and identify potential areas for future development.

PubMed Disclaimer

Conflict of interest statement

Declarations. Conflict of interest: The authors declare no conflicts of interest.

Figures

Fig. 1
Fig. 1
Full ROC curve performance. We show the ROC curves and AUROCs for meta-predictors (left) and individual predictors (right) on the full evaluation dataset. Predictors marked by diamonds use allele frequency as a feature. The black dashed lines at 5% FPR and 95% TPR demarcate the boundaries of the high-specificity and high-sensitivity regions, respectively, which are enlarged in Fig. 2
Fig. 2
Fig. 2
Performance in high-specificity and high-sensitivity regimes. We show enlarged portions of the ROC curves from Fig. 1 to focus on (A) the high-specificity region (FPR5%) and (B) the high-sensitivity region (TPR95%) for meta-predictors (left) and individual predictors (right). We also show the normalized area under the curve in these regions (normalized such that a perfect classifier gets a score of 1 and a random classifier gets a score of 0.5). Predictors marked by diamonds use allele frequency as a feature
Fig. 3
Fig. 3
Allele frequency bias. Top-performing predictors are evaluated for distinguishing benign variants in different allele frequency bins from pathogenic variants. All 6103 pathogenic variants were used in each evaluation, and benign variants were stratified by their allele frequencies obtained from the control cohort exomes in gnomAD v2.1.1 (Karczewski et al. 2020). Predictors marked by diamonds use allele frequency as a feature
Fig. 4
Fig. 4
Gene label balancing. We constructed a gene label-balanced subset of our evaluation dataset containing an equal number of pathogenic and benign variants per gene. This label-balanced dataset consists of 2140 variants from 504 genes. Performance on the label-balanced dataset (y-axis) is compared to performance on the full dataset from Fig. 1 (x-axis) for meta-predictors (left) and individual predictors (right). Predictors marked by diamonds use allele frequency as a feature

Update of

References

    1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nature Methods 7(4):248–249 - PMC - PubMed
    1. Alirezaie N, Kernohan KD, Hartley T, Majewski J, Hocking TD (2018) ClinPred: prediction tool to identify disease-relevant nonsynonymous single-nucleotide variants. The American Journal of Human Genetics 103(4):474–483 - PMC - PubMed
    1. Ancien F, Pucci F, Godfroid M, Rooman M (2018) Prediction and interpretation of deleterious coding variants in terms of protein structural stability. Scientific Reports 8(1):4480 - PMC - PubMed
    1. Bergquist T, Stenton SL, Nadeau EA, Byrne AB, Greenblatt MS, Harrison SM, Tavtigian SV, O’Donnell-Luria A, Biesecker LG, Radivojac P, et al. (2025) Calibration of additional computational tools expands ClinGen recommendation options for variant classification with PP3/BP4 criteria. Genetics in Medicine - PubMed
    1. Brandes N, Goldman G, Wang CH, Ye CJ, Ntranos V (2023) Genome-wide prediction of disease variant effects with a deep protein language model. Nature Genetics 55(9):1512–1522 - PMC - PubMed

LinkOut - more resources