Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 15;29(20):2572-8.
doi: 10.1093/bioinformatics/btt460. Epub 2013 Aug 21.

Genome compression: a novel approach for large collections

Affiliations

Genome compression: a novel approach for large collections

Sebastian Deorowicz et al. Bioinformatics. .

Abstract

Motivation: Genomic repositories are rapidly growing, as witnessed by the 1000 Genomes or the UK10K projects. Hence, compression of multiple genomes of the same species has become an active research area in the past years. The well-known large redundancy in human sequences is not easy to exploit because of huge memory requirements from traditional compression algorithms.

Results: We show how to obtain several times higher compression ratio than of the best reported results, on two large genome collections (1092 human and 775 plant genomes). Our inputs are variant call format files restricted to their essential fields. More precisely, our novel Ziv-Lempel-style compression algorithm squeezes a single human genome to ∼400 KB. The key to high compression is to look for similarities across the whole collection, not just against one reference sequence, what is typical for existing solutions.

Availability: http://sun.aei.polsl.pl/tgc (also as Supplementary Material) under a free license. Supplementary data: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Publication types