Efficient genotype compression and analysis of large genetic-variation data sets
- PMID: 26550772
- PMCID: PMC4697868
- DOI: 10.1038/nmeth.3654
Efficient genotype compression and analysis of large genetic-variation data sets
Abstract
Genotype Query Tools (GQT) is an indexing strategy that expedites analyses of genome-variation data sets in Variant Call Format based on sample genotypes, phenotypes and relationships. GQT's compressed genotype index minimizes decompression for analysis, and its performance relative to that of existing methods improves with cohort size. We show substantial (up to 443-fold) gains in performance over existing methods and demonstrate GQT's utility for exploring massive data sets involving thousands to millions of genomes. GQT can be accessed at https://github.com/ryanlayer/gqt.
Conflict of interest statement
The authors declare no competing financial interests.
Figures
References
-
- 1000 Genomes Project Consortium et al. Nature. 2012;491:56–65. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous
