The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity

Dominik G Grimm¹, Chloé-Agathe Azencott, Fabian Aicheler, Udo Gieraths, Daniel G MacArthur, Kaitlin E Samocha, David N Cooper, Peter D Stenson, Mark J Daly, Jordan W Smoller, Laramie E Duncan, Karsten M Borgwardt

Affiliations

Affiliation

¹ Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, Tübingen, Germany; Zentrum für Bioinformatik, Eberhard Karls Universität Tübingen, Tübingen, Germany; Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.

PMID: 25684150
PMCID: PMC4409520
DOI: 10.1002/humu.22768

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity

Dominik G Grimm et al. Hum Mutat. 2015 May.

. 2015 May;36(5):513-23.

doi: 10.1002/humu.22768. Epub 2015 Mar 26.

Authors

Affiliation

¹ Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, Tübingen, Germany; Zentrum für Bioinformatik, Eberhard Karls Universität Tübingen, Tübingen, Germany; Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.

PMID: 25684150
PMCID: PMC4409520
DOI: 10.1002/humu.22768

Abstract

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.

Keywords: exome sequencing; pathogenicity prediction tools.

PubMed Disclaimer

Figures

**Figure 1**
Evaluation of the 10 different pathogenicity prediction tools (by AUC) over five datasets. The hatched bars indicate potentially biased results, due to the overlap (or possible overlap) between the evaluation data and the data used (by tool developers) for training the prediction tool. The dotted bars indicate that the tool is biased due to type 2 circularity. The protein MV predictor and the logistic regression (over the features used in the weighting scheme of FatHMM‐W) are discussed in the second part of the *Results* section.

**Figure 2**
In the *VariBenchSelected* dataset, most SNPs are in genes with only neutral or only pathogenic variants. A: Protein perspective: proportion of proteins containing only neutral variants (“neutral‐only”), only pathogenic variants (“pathogenic‐only”), and both types of variants (“mixed”). Only 1.4% of the proteins are mixed. B: Variant perspective: proportions, of variants in each of the three categories of proteins. Only 5.2% of variants are in mixed proteins. C: Fractions of variants, in the *VariBenchSelected* dataset, containing various ratios of pathogenic‐to‐neutral variants, binned into increasingly narrow bins, approaching balanced proteins. The open interval ]0.0, 1.0[ contains all mixed proteins (as in B). Only 0.7% of all variants belong to almost perfectly balanced proteins (closed interval [0.4, 0.6]).

**Figure 3**
Performance of 10 pathogenicity prediction tools according to protein pathogenic‐to‐neutral variant ratio. Evaluation of tool performance on subsets of *VariBenchSelected*, *predictSNPSelected*, and *SwissVarSelected*, defined according to the relative proportions of pathogenic and neutral variants in the proteins they contain. “Pure” indicates variants belonging to proteins containing only one class of variant. (x and y) indicate variants belonging to mixed proteins, containing a ratio of pathogenic‐to‐neutral variants between x and y. ]0.0, 1.0[ therefore indicate all mixed proteins (the ratios of 0.0 and 1.0 being excluded by the reversed brackets). While FatHMM‐W performs well or excellently on variants belonging to pure proteins (*VariBenchSelected* and *predictSNPSelected*), it performs poorly on those belonging to mixed proteins.

**Figure 4**
Comparison of the performance of two metapredictors (Logit and Condel) and their component tools, across five datasets. Bar heights reflect AUC for each tool and tool combination. Logit and Condel are metapredictors combining MASS, PP2, and SIFT. The “+” versions of Logit and Condel also include FatHMM‐W. While effective in prediction, FATHMM‐W (alone and in the Logit+ and Condel+ metapredictors) is optimistically biased due to type 2 circularity (see *Results* section). In the “Selected” datasets, Logit provides the best unbiased performance. SIFT has the lowest performance in the *HumVar* and *ExoVar* datasets, but it is also the only predictor that is unbiased in these two datasets.

See this image and copyright information in PMC

References

1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. 2010. A method and server for predicting damaging missense mutations. Nat Methods 7:248–249. - PMC - PubMed
1. Bendl J, Stourac J, Salanda O, Pavelka A, Wieben ED, Zendulka J, Brezovsky J, Damborsky J. 2014. PredictSNP: robust and accurate consensus classifier for prediction of disease‐related mutations. PLoS Comput Biol 10:e1003440. - PMC - PubMed
1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. - PMC - PubMed
1. Capriotti E, Altman RB, Bromberg Y. 2013. Collective judgment predicts disease‐associated single nucleotide variants. BMC Genomics 14:S2. - PMC - PubMed
1. Chun S, Fay JC. 2009. Identification of deleterious mutations within three human genomes. Genome Res 19:1553–1561. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity

Affiliation

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources