Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 16;11(1):10.
doi: 10.1186/s40246-017-0104-8.

Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics

Affiliations

Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics

Khalid Mahmood et al. Hum Genomics. .

Abstract

Background: Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools.

Results: Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets.

Conclusions: These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.

Keywords: Benchmarking; Functional assays; Functional datasets; Genomic screening; Mutation assessment; Pathogenicity prediction; Protein function; Variant effect prediction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Venn diagram of datasets used in this study showing the overlaps among the deleterious and benign variants observed in these datasets. Humsavar displays a relatively high degree of overlap with the ClinvarHC and Swissvar datasets. The remaining datasets overlap to relatively small extents
Fig. 2
Fig. 2
Histogram depicting apparent accuracies of in silico variant effect predictors based on ROC curve AUCs for the benchmarking datasets used in this study
Fig. 3
Fig. 3
Apparent prediction accuracies of variant effect prediction tools when assessed using ClinvarHC versus functional mutation derived datasets, reported as AUCs derived from ROC curves

References

    1. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. - DOI - PMC - PubMed
    1. Spurdle AB, Healey S, Devereau A, Hogervorst FBL, Monteiro ANA, Nathanson KL, et al. ENIGMA—evidence-based network for the interpretation of germline mutant alleles: an international initiative to evaluate risk and clinical significance associated with sequence variation in BRCA1 and BRCA2 genes. Hum Mutat. 2012;33:2–7. doi: 10.1002/humu.21628. - DOI - PMC - PubMed
    1. Thompson BA, Spurdle AB, Plazzer J-P, Greenblatt MS, Akagi K, Al-Mulla F, et al. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database. Nat Genet Nature Research. 2013;46:107–115. doi: 10.1038/ng.2854. - DOI - PMC - PubMed
    1. Chandler MR, Bilgili EP, Merner ND. A review of whole-exome sequencing efforts toward hereditary breast cancer susceptibility gene discovery. Hum Mutat. 2016;37:835–846. doi: 10.1002/humu.23017. - DOI - PubMed
    1. Sullivan PF, Daly MJ, O’Donovan M. Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nat Rev Genet. 2012;13:537–551. doi: 10.1038/nrg3240. - DOI - PMC - PubMed

Publication types