This is a preprint.
Characterizing uncertainty in predictions of genomic sequence-to-activity models
- PMID: 38187742
- PMCID: PMC10769392
- DOI: 10.1101/2023.12.21.572730
Characterizing uncertainty in predictions of genomic sequence-to-activity models
Abstract
Genomic sequence-to-activity models are increasingly utilized to understand gene regulatory syntax and probe the functional consequences of regulatory variation. Current models make accurate predictions of relative activity levels across the human reference genome, but their performance is more limited for predicting the effects of genetic variants, such as explaining gene expression variation across individuals. To better understand the causes of these shortcomings, we examine the uncertainty in predictions of genomic sequence-to-activity models using an ensemble of Basenji2 model replicates. We characterize prediction consistency on four types of sequences: reference genome sequences, reference genome sequences perturbed with TF motifs, eQTLs, and personal genome sequences. We observe that models tend to make high-confidence predictions on reference sequences, even when incorrect, and low-confidence predictions on sequences with variants. For eQTLs and personal genome sequences, we find that model replicates make inconsistent predictions in >50% of cases. Our findings suggest strategies to improve performance of these models.
Figures




References
-
- Agarwal Vikram and Shendure Jay. Predicting mrna abundance directly from genomic sequence using deep convolutional neural networks. Cell reports, 31(7), 2020. - PubMed
-
- Avsec Žiga, Agarwal Vikram, Visentin Daniel, Ledsam Joseph R., Grabska-Barwinska Agnieszka, Taylor Kyle R., Assael Yannis, Jumper John, Kohli Pushmeet, and Kelley David R.. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10):1196–1203, October 2021. doi: 10.1038/s41592-021-01252-x. URL 10.1038/s41592-021-01252-x. - DOI - DOI - PMC - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous