Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 22;26(1):104.
doi: 10.1186/s13059-025-03575-w.

Variant effect predictor correlation with functional assays is reflective of clinical classification performance

Affiliations

Variant effect predictor correlation with functional assays is reflective of clinical classification performance

Benjamin J Livesey et al. Genome Biol. .

Abstract

Background: Understanding the relationship between protein sequence and function is crucial for accurate classification of missense variants. Variant effect predictors (VEPs) play a vital role in deciphering this complex relationship, yet evaluating their performance remains challenging for several reasons, including data circularity, where the same or related data is used for training and assessment. High-throughput experimental strategies like deep mutational scanning (DMS) offer a promising solution.

Results: In this study, we extend upon our previous benchmarking approach, assessing the performance of 97 VEPs using missense DMS measurements from 36 different human proteins. In addition, a new pairwise, VEP-centric approach mitigates the impact of missing predictions on overall performance comparison. We observe a strong correspondence between VEP performance in DMS-based benchmarks and clinical variant classification, especially for predictors that have not been directly trained on human clinical variants.

Conclusions: Our results suggest that comparing VEP performance against diverse functional assays represents a reliable strategy for assessing their relative performance in clinical variant classification. However, major challenges in clinical interpretation of VEP scores persist, highlighting the need for further research to fully leverage computational predictors for genetic diagnosis. We also address practical considerations for end users in terms of choice of methodology.

Keywords: ACMG/AMP; Benchmark; Circularity; DMS; MAVE; Multiplexed assay of variant effect; VEP; Variant effect predictor.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate.: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Correlation between variant effect scores from VEPs and DMS experiments. The Spearman’s correlation between all VEPs and every selected DMS dataset. VEPs are split into “population-based” and “clinical-trained” and “population-tuned” methods based on the usage of human clinical and population variants during training. DMS datasets are classified as “direct” if they directly measure the ability of the target protein to carry out one or more functions, with all others being classified as “indirect.” The VEP with the highest correlation is noted for every DMS dataset
Fig. 2
Fig. 2
The top 30 out of 97 tested VEPs ranked based on performance against the DMS benchmark. VEPs are ranked according to their average win rate against all other VEPs in pairwise Spearman’s correlation comparisons across all DMS datasets. The number of proteins for which each VEP had scores included is indicated in the right column of the plot. Those indicated with * had some DMS datasets excluded to avoid circularity concerns. Error bars represent the standard deviation in the rank score across 1000 bootstrap permutations of the benchmarking DMS datasets. The full ranking of all VEPs and all pairwise win rates are available in Additional file 2: Table S3
Fig. 3
Fig. 3
The top 30 out of 83 tested VEPs in terms of clinical variant classification performance. VEPs are ranked according to their average win rate against all other VEPs in pairwise AUROC comparisons across all human proteins with at least 10 pathogenic and 10 putatively benign missense variants. The number of proteins that met this condition for each predictor is indicated on the right of the plot. Some VEPs from the DMS benchmark could not be included here because predictions were not available for enough genes. Error bars represent the standard error across all comparisons with other VEPs. The full ranking of all VEPs and all pairwise win rates are available in Additional file 2: Table S8
Fig. 4
Fig. 4
Strong correspondence in relative performance of VEPs on the DMS vs clinical benchmarks. Average pairwise win rates in the DMS vs clinical benchmarks are plotted. Population-free and population-tuned VEPs show extremely strong correlations. In contrast, the clinical-trained VEPs show a much weaker correlation overall. The tendency for some clinical-trained VEPs to show large rightward shifts, reflecting relatively increased performance on the clinical benchmark, is likely to be due to circularity due to training on variants and genes present in the pathogenic and putatively benign datasets

Similar articles

Cited by

References

    1. Grimm DG, Azencott C-A, Aicheler F, Gieraths U, MacArthur DG, Samocha KE, et al. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum Mutat. 2015;36:513–23. - PMC - PubMed
    1. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–23. - PMC - PubMed
    1. Gunning AC, Fryer V, Fasham J, Crosby AH, Ellard S, Baple EL, et al. Assessing performance of pathogenicity predictors using clinically relevant variant datasets. J Med Genet. 2021;58:547–55. - PMC - PubMed
    1. Walters-Sen LC, Hashimoto S, Thrush DL, Reshmi S, Gastier-Foster JM, Astbury C, et al. Variability in pathogenicity prediction programs: impact on clinical diagnostics. Mol Genet Genomic Med. 2015;3:99–110. - PMC - PubMed
    1. Niroula A, Vihinen M. How good are pathogenicity predictors in detecting benign variants? PLOS Comput Biol. 2019;15: e1006481. - PMC - PubMed

LinkOut - more resources