Understanding the heterogeneous performance of variant effect predictors across human protein-coding genes
- PMID: 39478110
- PMCID: PMC11526010
- DOI: 10.1038/s41598-024-76202-6
Understanding the heterogeneous performance of variant effect predictors across human protein-coding genes
Abstract
Variant effect predictors (VEPs) are computational tools developed to assess the impacts of genetic mutations, often in terms of likely pathogenicity, employing diverse algorithms and training data. Here, we investigate the performance of 35 VEPs in the discrimination between pathogenic and putatively benign missense variants across 963 human protein-coding genes. We observe considerable gene-level heterogeneity as measured by the widely used area under the receiver operating characteristic curve (AUROC) metric. To investigate the origins of this heterogeneity and the extent to which gene-level VEP performance is predictable, for each VEP, we train random forest models to predict the gene-level AUROC. We find that performance as measured by AUROC is related to factors such as gene function, protein structure, and evolutionary conservation. Notably, intrinsic disorder in proteins emerged as a significant factor influencing apparent VEP performance, often leading to inflated AUROC values due to their enrichment in weakly conserved putatively benign variants. Our results suggest that gene-level features may be useful for identifying genes where VEP predictions are likely to be more or less reliable. However, our work also shows that AUROC, despite being independent of class balance, still has crucial limitations when used for comparing VEP performance across different genes.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures






Similar articles
-
Variant effect predictor correlation with functional assays is reflective of clinical classification performance.Genome Biol. 2025 Apr 22;26(1):104. doi: 10.1186/s13059-025-03575-w. Genome Biol. 2025. PMID: 40264194 Free PMC article.
-
Identification of pathogenic missense mutations using protein stability predictors.Sci Rep. 2020 Sep 21;10(1):15387. doi: 10.1038/s41598-020-72404-w. Sci Rep. 2020. PMID: 32958805 Free PMC article.
-
Identifying Mendelian disease genes with the variant effect scoring tool.BMC Genomics. 2013;14 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2164-14-S3-S3. Epub 2013 May 28. BMC Genomics. 2013. PMID: 23819870 Free PMC article.
-
Leveraging protein structural information to improve variant effect prediction.Curr Opin Struct Biol. 2025 Jun;92:103023. doi: 10.1016/j.sbi.2025.103023. Epub 2025 Feb 22. Curr Opin Struct Biol. 2025. PMID: 39987793 Review.
-
Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests.Hum Mutat. 2017 Sep;38(9):1072-1084. doi: 10.1002/humu.23266. Epub 2017 Jun 21. Hum Mutat. 2017. PMID: 28544059 Free PMC article. Review.
Cited by
-
A complete map of human cytosolic degrons and their relevance for disease.bioRxiv [Preprint]. 2025 May 15:2025.05.10.653233. doi: 10.1101/2025.05.10.653233. bioRxiv. 2025. PMID: 40463067 Free PMC article. Preprint.
-
AFFIPred: AlphaFold2 structure-based Functional Impact Prediction of missense variations.Protein Sci. 2025 Feb;34(2):e70030. doi: 10.1002/pro.70030. Protein Sci. 2025. PMID: 39840793
-
Variant effect predictor correlation with functional assays is reflective of clinical classification performance.Genome Biol. 2025 Apr 22;26(1):104. doi: 10.1186/s13059-025-03575-w. Genome Biol. 2025. PMID: 40264194 Free PMC article.
References
-
- Mardis, E. R. Next-generation sequencing platforms. Annu. Rev. Anal. Chem.610.1146/annurev-anchem-062012-092628 (2013). - PubMed
-
- Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol.26, 1135–1145. 10.1038/nbt1486 (2008). - PubMed
-
- Niroula, A. & Vihinen, M. Variation interpretation predictors: Principles, types, performance, and choice. Hum. Mutat.37, 579–597. 10.1002/humu.22987 (2016). - PubMed
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources