Assessing the reliability of point mutation as data augmentation for deep learning with genomic data
- PMID: 38689247
- PMCID: PMC11059627
- DOI: 10.1186/s12859-024-05787-6
Assessing the reliability of point mutation as data augmentation for deep learning with genomic data
Abstract
Background: Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data.
Results: Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology: point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection.
Conclusion: Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.
Keywords: Data augmentation; Deep learning; Point mutations; Splicing; Translation initiation.
© 2024. The Author(s).
Conflict of interest statement
The authors declare that they have no Conflict of interest.
Figures




Similar articles
-
EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations.Genome Biol. 2023 May 5;24(1):105. doi: 10.1186/s13059-023-02941-w. Genome Biol. 2023. PMID: 37143118 Free PMC article.
-
Mutate and observe: utilizing deep neural networks to investigate the impact of mutations on translation initiation.Bioinformatics. 2023 Jun 1;39(6):btad338. doi: 10.1093/bioinformatics/btad338. Bioinformatics. 2023. PMID: 37225409 Free PMC article.
-
ROOD-MRI: Benchmarking the robustness of deep learning segmentation models to out-of-distribution and corrupted data in MRI.Neuroimage. 2023 Sep;278:120289. doi: 10.1016/j.neuroimage.2023.120289. Epub 2023 Jul 24. Neuroimage. 2023. PMID: 37495197
-
Interpretation of deep learning in genomics and epigenomics.Brief Bioinform. 2021 May 20;22(3):bbaa177. doi: 10.1093/bib/bbaa177. Brief Bioinform. 2021. PMID: 34020542 Free PMC article. Review.
-
Deep learning: new computational modelling techniques for genomics.Nat Rev Genet. 2019 Jul;20(7):389-403. doi: 10.1038/s41576-019-0122-6. Nat Rev Genet. 2019. PMID: 30971806 Review.
References
-
- Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontanon S, et al. Big bird: transformers for longer sequences. Adv Neural Inf Process Syst. 2020;33:17283–17297.
-
- Zuallaert J, Kim M, Soete A, Saeys Y, Neve WD. TISRover: ConvNets learn biologically relevant features for effective translation initiation site prediction. Int J Data Min Bioinform. 2018;20(3):267–284. doi: 10.1504/IJDMB.2018.094781. - DOI
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources