GTC: how to maintain huge genotype collections in a compressed form

Agnieszka Danek¹, Sebastian Deorowicz¹

Affiliations

PMID: 29351600
DOI: 10.1093/bioinformatics/bty023

GTC: how to maintain huge genotype collections in a compressed form

Agnieszka Danek et al. Bioinformatics. 2018.

. 2018 Jun 1;34(11):1834-1840.

doi: 10.1093/bioinformatics/bty023.

Authors

Agnieszka Danek¹, Sebastian Deorowicz¹

Affiliation

¹ Institute of Informatics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland.

PMID: 29351600
DOI: 10.1093/bioinformatics/bty023

Abstract

Motivation: Nowadays, genome sequencing is frequently used in many research centers. In projects, such as the Haplotype Reference Consortium or the Exome Aggregation Consortium, huge databases of genotypes in large populations are determined. Together with the increasing size of these collections, the need for fast and memory frugal ways of representation and searching in them becomes crucial.

Results: We present GTC (GenoType Compressor), a novel compressed data structure for representation of huge collections of genetic variation data. It significantly outperforms existing solutions in terms of compression ratio and time of answering various types of queries. We show that the largest of publicly available database of about 60 000 haplotypes at about 40 million SNPs can be stored in <4 GB, while the queries related to variants are answered in a fraction of a second.

Availability and implementation: GTC can be downloaded from https://github.com/refresh-bio/GTC or http://sun.aei.polsl.pl/REFRESH/gtc.

Contact: sebastian.deorowicz@polsl.pl.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

GTC: how to maintain huge genotype collections in a compressed form

Affiliation

GTC: how to maintain huge genotype collections in a compressed form

Authors

Affiliation

Abstract

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources