Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 1;30(23):3342-8.
doi: 10.1093/bioinformatics/btu571. Epub 2014 Aug 27.

VSEAMS: a pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes

Affiliations

VSEAMS: a pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes

Oliver S Burren et al. Bioinformatics. .

Abstract

Motivation: Genome-wide association studies (GWAS) have identified many loci implicated in disease susceptibility. Integration of GWAS summary statistics (P-values) and functional genomic datasets should help to elucidate mechanisms.

Results: We extended a non-parametric SNP set enrichment method to test for enrichment of GWAS signals in functionally defined loci to a situation where only GWAS P-values are available. The approach is implemented in VSEAMS, a freely available software pipeline. We use VSEAMS to identify enrichment of type 1 diabetes (T1D) GWAS associations near genes that are targets for the transcription factors IKZF3, BATF and ESRRA. IKZF3 lies in a known T1D susceptibility region, while BATF and ESRRA overlap other immune disease susceptibility regions, validating our approach and suggesting novel avenues of research for T1D.

Availability and implementation: VSEAMS is available for download (http://github.com/ollyburren/vseams).

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The VSEAMS pipeline; mandatory inputs are shaded grey; a dashed border indicates that one or the other input is required. VSEAMS takes as input either two lists of genes or two lists of regions for comparison. Given genes, regions are defined by taking gene coordinates ± 200 kb around the TSS. GWAS summary statistics (P-values) for SNPs in those regions are extracted. The observed Wilcoxon rank sum test statistic is compared with its null distribution determined by its theoretical mean and a variance derived by simulating null P-values with a correlation structure matching the underlying genotype structure. Caching of pregenerated LD matrices reduces computation time. A full description of each step is available in the Supplementary Information
Fig. 2.
Fig. 2.
A comparison of Z-scores generated using permuted phenotype method (10 000 permutations) versus using summary P-values and VSEAMS (10 000 simulations) for T1DGC study, over 100 randomly generated gene sets
Fig. 3.
Fig. 3.
A runtime comparison of simulation using multivariate normal (black) versus permutation (grey) over 1000 randomly selected LD blocks. In both plots the y-axis is the median execution time over 10 iterations, and lines indicate the fitting of a linear model. Specifically, (a) details the effect of sample size on median execution time over 14 753 SNPs summed over all randomly selected LD blocks. (b) Shows the effect of SNP count on execution time for 4000 cases and controls for all 1000 randomly selected LD blocks
Fig. 4.
Fig. 4.
T1D susceptible SNP enrichment (excluding major histocompatibility complex (MHC)) within transcription factor perturbed gene sets from Cusanovich et al. (2014) SNPs are pruned on the basis of r2 threshold 0.95. A positive Z-score indicates enrichment, labels denote associated P-values. Black bars indicate that the knocked-down transcription factor overlaps a known autoimmune susceptibility locus curated in ImmunoBase
Fig. 5.
Fig. 5.
Comparison of VSEAMS and permuted phenotype methods with differing sample size, for example, gene sets, where enrichment is present (IKZF3) and absent (YY1). (a) Shows difference in Z-scores between both methods with 10 000 simulations and a variable sample size, with an equal number of cases and controls. (b) Shows how the correlation between Z-scores over a variable number of permutations varies with respect to sample size. The coloured lines represent a locally estimated scatterplot smoothing (LOESS) fitted model for each sample size

References

    1. Barrett JC, et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat. Genet. 2009;41:703–707. - PMC - PubMed
    1. Burren OS, et al. T1DBase: update 2011, organization and presentation of large-scale data sets for type 1 diabetes research. Nucleic Acids Res. 2011;39:D997–D1001. - PMC - PubMed
    1. Cusanovich DA, et al. The functional consequences of variation in transcription factor binding. PLoS Genet. 2014;10:e1004226. - PMC - PubMed
    1. Geweke J. Bayesian Statistics 4: Evaluating the Accuracy of Sampling-based Approaches to the Calculation of Posterior Moments. Oxford, UK: Oxford University Press; 1992.
    1. Heinig M, et al. A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk. Nature. 2010;467:460–464. - PMC - PubMed

Publication types

MeSH terms

Substances