CRAM 3.1: advances in the CRAM file format
- PMID: 34999766
- PMCID: PMC8896640
- DOI: 10.1093/bioinformatics/btac010
CRAM 3.1: advances in the CRAM file format
Abstract
Motivation: CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments.
Results: With Illumina data CRAM 3.1 is 7-15% smaller than the equivalent CRAM 3.0 file, and 50-70% smaller than the corresponding BAM file. Long-read technology shows more modest compression due to the presence of high-entropy signals.
Availability and implementation: The CRAM 3.0 specification is freely available from https://samtools.github.io/hts-specs/CRAMv3.pdf. The CRAM 3.1 improvements are available in a separate OpenSource HTScodecs library from https://github.com/samtools/htscodecs, and have been incorporated into HTSlib.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press.
Figures
References
-
- Bliss B. et al. (2018) Genie: an MPEG-G conformant software to compress genomic data.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous
