SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models
- PMID: 38796686
- PMCID: PMC11153836
- DOI: 10.1093/bioinformatics/btae340
SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models
Abstract
Summary: The increasing development of sequence-based machine learning models has raised the demand for manipulating sequences for this application. However, existing approaches to edit and evaluate genome sequences using models have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing and supporting in silico mutagenesis experiments. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences.
Availability and implementation: SuPreMo was written in Python, and can be run using only one line of code to generate both sequences and 3D genome disruption scores. The codebase, instructions for installation and use, and tutorials are on the GitHub page: https://github.com/ketringjoni/SuPreMo.
© The Author(s) 2024. Published by Oxford University Press.
Conflict of interest statement
None declared.
Figures

Update of
-
SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models.bioRxiv [Preprint]. 2023 Nov 5:2023.11.03.565556. doi: 10.1101/2023.11.03.565556. bioRxiv. 2023. Update in: Bioinformatics. 2024 Jun 3;40(6):btae340. doi: 10.1093/bioinformatics/btae340. PMID: 37961123 Free PMC article. Updated. Preprint.
Similar articles
-
SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models.bioRxiv [Preprint]. 2023 Nov 5:2023.11.03.565556. doi: 10.1101/2023.11.03.565556. bioRxiv. 2023. Update in: Bioinformatics. 2024 Jun 3;40(6):btae340. doi: 10.1093/bioinformatics/btae340. PMID: 37961123 Free PMC article. Updated. Preprint.
-
Rhapsody: predicting the pathogenicity of human missense variants.Bioinformatics. 2020 May 1;36(10):3084-3092. doi: 10.1093/bioinformatics/btaa127. Bioinformatics. 2020. PMID: 32101277 Free PMC article.
-
MIMIC: a Python package for simulating, inferring, and predicting microbial community interactions and dynamics.Bioinformatics. 2025 May 6;41(5):btaf174. doi: 10.1093/bioinformatics/btaf174. Bioinformatics. 2025. PMID: 40408146 Free PMC article.
-
Medusa: Software to build and analyze ensembles of genome-scale metabolic network reconstructions.PLoS Comput Biol. 2020 Apr 29;16(4):e1007847. doi: 10.1371/journal.pcbi.1007847. eCollection 2020 Apr. PLoS Comput Biol. 2020. PMID: 32348298 Free PMC article.
-
Scalable transcriptomics analysis with Dask: applications in data science and machine learning.BMC Bioinformatics. 2022 Nov 30;23(1):514. doi: 10.1186/s12859-022-05065-3. BMC Bioinformatics. 2022. PMID: 36451115 Free PMC article. Review.
Cited by
-
De novo structural variants in autism spectrum disorder disrupt distal regulatory interactions of neuronal genes.bioRxiv [Preprint]. 2024 Nov 7:2024.11.06.621353. doi: 10.1101/2024.11.06.621353. bioRxiv. 2024. PMID: 39574698 Free PMC article. Preprint.
-
Unveiling the Genetic Landscape of Coronary Artery Disease Through Common and Rare Structural Variants.J Am Heart Assoc. 2025 Feb 18;14(4):e036499. doi: 10.1161/JAHA.124.036499. Epub 2025 Feb 14. J Am Heart Assoc. 2025. PMID: 39950338 Free PMC article.
-
Interpreting the CTCF-mediated sequence grammar of genome folding with AkitaV2.PLoS Comput Biol. 2025 Feb 4;21(2):e1012824. doi: 10.1371/journal.pcbi.1012824. eCollection 2025 Feb. PLoS Comput Biol. 2025. PMID: 39903776 Free PMC article.
-
An integrated view of the structure and function of the human 4D nucleome.bioRxiv [Preprint]. 2024 Oct 27:2024.09.17.613111. doi: 10.1101/2024.09.17.613111. bioRxiv. 2024. PMID: 39484446 Free PMC article. Preprint.
References
-
- Agarwal V, Shendure J.. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep 2020;31:107663. - PubMed