. 2021 Aug;58(8):547-555.

doi: 10.1136/jmedgenet-2020-107003. Epub 2020 Aug 25.

Assessing performance of pathogenicity predictors using clinically relevant variant datasets

Adam C Gunning^#^{1

2}, Verity Fryer^#², James Fasham¹, Andrew H Crosby¹, Sian Ellard², Emma L Baple¹, Caroline F Wright³

Affiliations

¹ College of Medicine and Health, University of Exeter Medical School Institute of Biomedical and Clinical Science, Exeter, Devon, UK.
² Exeter Genomics Laboratory, Royal Devon & Exeter NHS Foundation Trust, Exeter, UK.
³ College of Medicine and Health, University of Exeter Medical School Institute of Biomedical and Clinical Science, Exeter, Devon, UK Caroline.Wright@exeter.ac.uk.

^# Contributed equally.

PMID: 32843488
PMCID: PMC8327323
DOI: 10.1136/jmedgenet-2020-107003

Assessing performance of pathogenicity predictors using clinically relevant variant datasets

Adam C Gunning et al. J Med Genet. 2021 Aug.

. 2021 Aug;58(8):547-555.

doi: 10.1136/jmedgenet-2020-107003. Epub 2020 Aug 25.

Authors

Adam C Gunning^#^{1

2}, Verity Fryer^#², James Fasham¹, Andrew H Crosby¹, Sian Ellard², Emma L Baple¹, Caroline F Wright³

Affiliations

¹ College of Medicine and Health, University of Exeter Medical School Institute of Biomedical and Clinical Science, Exeter, Devon, UK.
² Exeter Genomics Laboratory, Royal Devon & Exeter NHS Foundation Trust, Exeter, UK.
³ College of Medicine and Health, University of Exeter Medical School Institute of Biomedical and Clinical Science, Exeter, Devon, UK Caroline.Wright@exeter.ac.uk.

^# Contributed equally.

PMID: 32843488
PMCID: PMC8327323
DOI: 10.1136/jmedgenet-2020-107003

Abstract

Background: Pathogenicity predictors are integral to genomic variant interpretation but, despite their widespread usage, an independent validation of performance using a clinically relevant dataset has not been undertaken.

Methods: We derive two validation datasets: an 'open' dataset containing variants extracted from publicly available databases, similar to those commonly applied in previous benchmarking exercises, and a 'clinically representative' dataset containing variants identified through research/diagnostic exome and panel sequencing. Using these datasets, we evaluate the performance of three recent meta-predictors, REVEL, GAVIN and ClinPred, and compare their performance against two commonly used in silico tools, SIFT and PolyPhen-2.

Results: Although the newer meta-predictors outperform the older tools, the performance of all pathogenicity predictors is substantially lower in the clinically representative dataset. Using our clinically relevant dataset, REVEL performed best with an area under the receiver operating characteristic curve of 0.82. Using a concordance-based approach based on a consensus of multiple tools reduces the performance due to both discordance between tools and false concordance where tools make common misclassification. Analysis of tool feature usage may give an insight into the tool performance and misclassification.

Conclusion: Our results support the adoption of meta-predictors over traditional in silico tools, but do not support a consensus-based approach as in current practice.

Keywords: genetic testing; genetic variation; genetics; genomics; human genetics.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

**Figure 1**
Flow diagram of selection and filtering steps used for the generation of the open (A) and clinical (B) datasets. Oval—variant source; box—selection criteria; rounded box—dataset. Red text (right) shows the number of pathogenic variants, green text (left) shows the number of benign variants. MAF, minor allele frequency.

**Figure 2**
In silico pathogenicity predictor feature usage and source. Shading indicates that a category of evidence is used by the tool. Codes within each box indicate that the feature is inherited from another tool. Feature lists were taken from the tools' original publications, supplementary materials and available online material. C, CADD; D, DANN; F, FATHMM; FC, FitCons; MP, MutPred; MT, MutationTaster; P, PolyPhen-2; S, SIFT; V, VEST. An extended version is shown in online supplementary figure S1.

**Figure 3**
Violin plot showing variant scores for SIFT, PolyPhen-2, REVEL and ClinPred using two datasets. Open dataset—blue; clinical dataset—red; pathogenic variants—filled; benign variants—unfilled. Plot was generated in R using the 'vioplot' function in the 'vioplot' library. For ease of comparison, SIFT scores have been inverted.

**Figure 4**
Receiver operating characteristic (ROC) curves for SIFT, PolyPhen-2, REVEL and ClinPred using two datasets. Open dataset—blue; clinical dataset—red. Generated in R using the ‘roc’ and ‘plot.roc’ functions in the ‘pROC’ library. Area under the ROC curve (AUC) was calculated in R using the ‘roc’ function. For ease of comparison, SIFT scores have been inverted.

**Figure 5**
Concordance between tools separated by dataset and classification (pathogenic and benign). Open dataset—blue; clinical dataset—red; pathogenic variants—top graph; benign variants—bottom graph. True concordance indicates that the tools agree and were correct. False concordance indicates that the tools agree but were incorrect. Discordance indicates that the tools disagreed on the classification.

See this image and copyright information in PMC

References

1. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E, Voelkerding K, Rehm HL, ACMG Laboratory Quality Assurance Committee . Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17:405–23. 10.1038/gim.2015.30 - DOI - PMC - PubMed
1. Sim N-L, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 2012;40:W452–7. 10.1093/nar/gks539 - DOI - PMC - PubMed
1. Thusberg J, Olatubosun A, Vihinen M. Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 2011;32:358–68. 10.1002/humu.21445 - DOI - PubMed
1. Ellard S, Baple E, Berry I, Forrester N, Turnbull C, Owens M. ACGS best practice guidelines for variant classification; 2019. https://www.acgs.uk.com/news/acgs-best-practice-guidelines-for-variant-c...
1. Grantham R. Amino acid difference formula to help explain protein evolution. Science 1974;185:862–4. 10.1126/science.185.4154.862 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

200990/WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing performance of pathogenicity predictors using clinically relevant variant datasets

Affiliations

Assessing performance of pathogenicity predictors using clinically relevant variant datasets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources