Crumble: reference free lossy compression of sequence quality values
- PMID: 29992288
- PMCID: PMC6330002
- DOI: 10.1093/bioinformatics/bty608
Crumble: reference free lossy compression of sequence quality values
Abstract
Motivation: The bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving.
Results: On the Syndip test set, a 17 fold reduction in the quality storage portion of a CRAM file can be achieved while maintaining variant calling accuracy. The size reduction of an entire CRAM file varied from 2.2 to 7.4 fold, depending on the non-quality content of the original file (see Supplementary Material S6 for details).
Availability and implementation: Crumble is OpenSource and can be obtained from https://github.com/jkbonfield/crumble.
Supplementary information: Supplementary data are available at Bioinformatics online.
References
-
- Cánovas R. et al. (2014) Lossy compression of quality scores in genomic data. Bioinformatics, 30, 2130–2136. - PubMed
-
- Garrison E., Marth G. (2012) Haplotype-based variant detection from short-read sequencing. arXiv Preprint arXiv, 1207, 3907.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
