Ultra-fast variant effect prediction using biophysical transcription factor binding models
- PMID: 41063341
- PMCID: PMC12507518
- DOI: 10.1093/nar/gkaf940
Ultra-fast variant effect prediction using biophysical transcription factor binding models
Abstract
Sequence variation within transcription factor (TF)-binding sites can significantly affect TF-DNA interactions, influencing gene expression and contributing to disease susceptibility or phenotypic traits. Despite recent progress in deep sequence-to-function models that predict functional output from sequence data, these methods perform inadequately on some variant effect prediction tasks, especially with common genetic variants. This limitation underscores the importance of leveraging biophysical models of TF binding to enhance interpretability of variant effect scores and facilitate mechanistic insights. We introduce motifDiff, a novel computational tool designed to quantify variant effects using mono- and dinucleotide position weight matrices. motifDiff offers several key advantages, including scalability to score millions of variants within minutes, implementation of statistically rigorous normalization strategy critical for optimal performance, and support for both dinucleotide and mononucleotide models. We demonstrate motifDiff's efficacy by evaluating it across diverse ground truth datasets that quantify the effects of common variants in vivo, thereby establishing robust benchmarks for the predictive value of variant effect calculations. Finally, we show that our tool provides unique insights when interpreting human accelerated regions. motifDiff is available as a standalone Python application at https://github.com/rezwanhosseini/MotifDiff.
© The Author(s) 2025. Published by Oxford University Press.
Conflict of interest statement
None declared.
Figures
References
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous
