GABAC: an arithmetic coding solution for genomic data
- PMID: 31830243
- PMCID: PMC7141842
- DOI: 10.1093/bioinformatics/btz922
GABAC: an arithmetic coding solution for genomic data
Abstract
Motivation: In an effort to provide a response to the ever-expanding generation of genomic data, the International Organization for Standardization (ISO) is designing a new solution for the representation, compression and management of genomic sequencing data: the Moving Picture Experts Group (MPEG)-G standard. This paper discusses the first implementation of an MPEG-G compliant entropy codec: GABAC. GABAC combines proven coding technologies, such as context-adaptive binary arithmetic coding, binarization schemes and transformations, into a straightforward solution for the compression of sequencing data.
Results: We demonstrate that GABAC outperforms well-established (entropy) codecs in a significant set of cases and thus can serve as an extension for existing genomic compression solutions, such as CRAM.
Availability and implementation: The GABAC library is written in C++. We also provide a command line application which exercises all features provided by the library. GABAC can be downloaded from https://github.com/mitogen/gabac.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.
Figures

References
-
- Hach F. et al. (2014) DeeZ: reference-based compression by local assembly. Nat. Methods, 11, 1082–1084. - PubMed
-
- Marpe D. et al. (2003) Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard. IEEE Trans. Circuits Syst. Video Technol., 13, 620–636.