A fast machine learning dataloader for epigenetic tracks from BigWig files
- PMID: 38175786
- PMCID: PMC10782802
- DOI: 10.1093/bioinformatics/btad767
A fast machine learning dataloader for epigenetic tracks from BigWig files
Abstract
Summary: We created bigwig-loader, a data-loader for epigenetic profiles from BigWig files that decompresses and processes information for multiple intervals from multiple BigWig files in parallel. This is an access pattern needed to create training batches for typical machine learning models on epigenetics data. Using a new codec, the decompression can be done on a graphical processing unit (GPU) making it fast enough to create the training batches during training, mitigating the need for saving preprocessed training examples to disk.
Availability and implementation: The bigwig-loader installation instructions and source code can be accessed at https://github.com/pfizer-opensource/bigwig-loader.
© The Author(s) 2024. Published by Oxford University Press.
Figures
References
-
- Abadi M, Agarwal A, Barham P. et al. TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. 2015.
-
- Kelley DR, Reshef Y, Bileschi M. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 2018. https://genome.cshlp.org/content/early/2018/03/27/gr.227819.117.full.pdf.... - PMC - PubMed
