Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct;19(5):2150026.
doi: 10.1142/S0219720021500268. Epub 2021 Sep 30.

Compression for population genetic data through finite-state entropy

Affiliations

Compression for population genetic data through finite-state entropy

Winfield Chen et al. J Bioinform Comput Biol. 2021 Oct.

Abstract

We improve the efficiency of population genetic file formats and GWAS computation by leveraging the distribution of samples in population-level genetic data. We identify conditional exchangeability of these data, recommending finite state entropy algorithms as an arithmetic code naturally suited for compression of population genetic data. We show between [Formula: see text] and [Formula: see text] speed and size improvements over modern dictionary compression methods that are often used for population genetic data such as Zstd and Zlib in computation and decompression tasks. We provide open source prototype software for multi-phenotype GWAS with finite state entropy compression demonstrating significant space saving and speed comparable to the state-of-the-art.

Keywords: Statistical genetics; big data; genome-wide association study; genotype compression; multi-phenotype analysis.

PubMed Disclaimer

Publication types

LinkOut - more resources