Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 13;22(1):32.
doi: 10.1186/s13059-020-02248-0.

The variant call format provides efficient and robust storage of GWAS summary statistics

Affiliations

The variant call format provides efficient and robust storage of GWAS summary statistics

Matthew S Lyon et al. Genome Biol. .

Abstract

GWAS summary statistics are fundamental for a variety of research applications yet no common storage format has been widely adopted. Existing tabular formats ambiguously or incompletely store information about genetic variants and associations, lack essential metadata and are typically not indexed yielding poor query performance and increasing the possibility of errors in data interpretation and post-GWAS analyses. To address these issues, we adapted the variant call format to store GWAS summary statistics (GWAS-VCF) and developed open-source tools to use this format in downstream analyses. We provide open access to over 10,000 complete GWAS summary datasets converted to this format ( https://gwas.mrcieu.ac.uk ).

Keywords: GWAS; Storage format; Summary statistics; VCF.

PubMed Disclaimer

Conflict of interest statement

TRG receives funding from GlaxoSmithKline and Biogen for unrelated research.

Figures

Fig. 1
Fig. 1
Performance comparison for querying summary statistics in plain text and GWAS-VCF. Mean query time (seconds, lower is quicker; repetitions n = 100) to extract either a single variant using the chromosome position or dbSNP [31] identifier or multiple variants using a 1-Mb interval or association P value. AWK, grep, bcftools [23] and rsidx [32] were evaluated using uncompressed/GZIP compressed TSV and BGZIP [23] compressed VCF. The summary statistics files contained one (single) or five (multiple) GWAS studies which were prepared by subsampling variants (n = 0.5 M, 2.5 M, 10 M) obtain from Neale et al. [35]. Error bars represent the 95% confidence interval

References

    1. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet Cell Press. 2017;101(1):5–22. - PMC - PubMed
    1. Hou L, Zhao H. A review of post-GWAS prioritization approaches. Front Genet. 2013;4:280. - PMC - PubMed
    1. Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh PR, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47:1228–1235. - PMC - PubMed
    1. Smith GD, Ebrahim S. “Mendelian randomization”: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22. - PubMed
    1. Bulik-Sullivan B, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291–5. - PMC - PubMed

Publication types

LinkOut - more resources