Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Oct 16;10(10):e0140758.
doi: 10.1371/journal.pone.0140758. eCollection 2015.

Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation

Affiliations
Review

Expression Quantitative Trait Loci Information Improves Predictive Modeling of Disease Relevance of Non-Coding Genetic Variation

Damien C Croteau-Chonka et al. PLoS One. .

Abstract

Disease-associated loci identified through genome-wide association studies (GWAS) frequently localize to non-coding sequence. We and others have demonstrated strong enrichment of such single nucleotide polymorphisms (SNPs) for expression quantitative trait loci (eQTLs), supporting an important role for regulatory genetic variation in complex disease pathogenesis. Herein we describe our initial efforts to develop a predictive model of disease-associated variants leveraging eQTL information. We first catalogued cis-acting eQTLs (SNPs within 100 kb of target gene transcripts) by meta-analyzing four studies of three blood-derived tissues (n = 586). At a false discovery rate < 5%, we mapped eQTLs for 6,535 genes; these were enriched for disease-associated genes (P < 10(-04)), particularly those related to immune diseases and metabolic traits. Based on eQTL information and other variant annotations (distance from target gene transcript, minor allele frequency, and chromatin state), we created multivariate logistic regression models to predict SNP membership in reported GWAS. The complete model revealed independent contributions of specific annotations as strong predictors, including evidence for an eQTL (odds ratio (OR) = 1.2-2.0, P < 10(-11)) and the chromatin states of active promoters, different classes of strong or weak enhancers, or transcriptionally active regions (OR = 1.5-2.3, P < 10(-11)). This complete prediction model including eQTL association information ultimately allowed for better discrimination of SNPs with higher probabilities of GWAS membership (6.3-10.0%, compared to 3.5% for a random SNP) than the other two models excluding eQTL information. This eQTL-based prediction model of disease relevance can help systematically prioritize non-coding GWAS SNPs for further functional characterization.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Venn diagram of overlaps of eQTL genes identified in specific individual cohorts and through meta-analysis.
Numbers represent counts of genes with at least one significant eQTL SNP (FDR < 5%) in each of the four study cohorts (“CAMP WB”, “CAMP CD4”, “CEU LCL”, and “CARE CD4” in white ellipses) and in their combined meta-analysis (“META” in gray ellipse). Total counts for each group are also summarized in Table 1.
Fig 2
Fig 2. Relationships of eQTL meta-analysis gene yields with representation in individual cohorts and a previous study.
Counts of all significant eQTL genes (meta-analysis FDR < 5%, Table 1) identified per source category are shown with white bars. The first four categories (“1C” through “4C”) represent the number of individual cohorts in which a gene was identified. The fifth category (“UNION”) is the union of the genes from the preceding four categories. The sixth category (“META”) is the set of genes identified in the meta-analysis. Top panel: For comparison, the counts of genes in each category also found by the meta-analysis are shown with overlapping gray bars. Among genes found in the meta-analysis, the count of genes not identified in any of the individual cohorts is shown with a black bar. Bottom panel: The counts of genes found in an eQTL study in WB by Westra et al. [13] are shown with black bars.
Fig 3
Fig 3. Genes associated with inflammatory and other categories of disease traits enriched for meta-analysis eQTL genes.
In each histogram, the observed number of genes in the given category harboring at least one significant eQTL SNP (meta-analysis FDR < 5%) is marked with a dashed vertical line. The null distributions derived from 10,000 permutations are shown with gray bars.
Fig 4
Fig 4. Forest plot of component effects of complete GWAS predictive model based on training set of SNPs.
Odds ratios (black squares) from the complete multivariate model (“chromstate+eqtl [M3]”) for features predicting the membership of a SNP in the NHGRI GWAS Catalog are shown here with standard errors (gray lines). Smaller models are shown for comparison in S2 Fig. Four classes of SNP annotation are represented in the model, each with multiple levels: distance from gene, MAF, chromatin state in GM12878 LCLs (12), and evidence of eQTL association based on meta-analysis FDR. The base levels for each annotation are “0 kb (within gene)” [Distance from Gene], “>10%” [MAF], “Heterochromatin (13)” [ChromHMM], and “>50%” [FDR].
Fig 5
Fig 5. Multivariate logistic models predicting SNP membership in GWAS are well-calibrated.
Top panel: Three models were developed for predicting the membership of a given SNP in the NHGRI GWAS Catalog, all incorporating at minimum the distance of the SNP from the transcript boundaries of its target gene and the minor allele frequency of the SNP. The "structure [M1]" model (white) also incorporates the NCBI gene structure classification of the gene (intron, coding, untranslated region, etc.) (S2 Fig); "chromstate [M2]" (gray) instead incorporates chromatin state (S2 Fig); "chromstate+eqtl [M3]" (black) incorporates both chromatin state and eQTL FDR class (Fig 4). The x-axis shows equal-sized bins of predicted probabilities of being a GWAS SNP. This particular choice of bins based on the widest range of probabilities (from M3) aids visual comparison of calibration among the three models by smoothing the proportions of observed GWAS SNPs. The y-axis shows the actual proportion of GWAS SNPs in that bin. The dashed green line at 3.5% represents the mean probability of a random SNP in the genome for being a GWAS hit or a close proxy (r 2 > 0.8) for one. Bottom panel: a table of absolute counts of SNPs in each predicted probability bin for each of the predictive models. For the M1 and M2 models, no SNPs had predicted probabilities > 6.3%.
Fig 6
Fig 6. ROC curves for multivariate logistic models predicting SNP membership in GWAS.
Components of the three predictive models are described in Fig 5.
Fig 7
Fig 7. Evidence for an eQTL signal at the IRF8 locus associated with systemic sclerosis.
Top panel:–log10 FDR for meta-analysis associations of nearby SNPs with expression of the longer isoform of IRF8. Middle panel: Chromatin states (CS) in LCLs (GM12878) [12] and the target IRF8 transcript. The two most strongly associated SNPs (including the systemic sclerosis GWAS [38] index SNP rs11642873) overlap a predicted weak enhancer region (yellow). Nearby upstream is a predicted active promoter region (red) that is likely spurious given that it overlaps no gene, predicted or otherwise. Bottom panel: boxplots showing probe expression residuals by genotype of index SNP rs11642873 in the four individual cohorts, where the “A” allele is A and the “B” allele is C. None of the cohort-specific associations are individually significant at FDR < 5%, though the meta-analysis is significant at this level.

References

    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106:9362–7. 10.1073/pnas.0903103106 - DOI - PMC - PubMed
    1. Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLOS Genet. 2010;6:e1000888 10.1371/journal.pgen.1000888 - DOI - PMC - PubMed
    1. Murphy A, Chu JH, Xu M, Carey VJ, Lazarus R, Liu A, et al. Mapping of numerous disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytes. Hum Mol Genet. 2010;19:4745–57. 10.1093/hmg/ddq392 - DOI - PMC - PubMed
    1. Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. 10.1038/nature11247 - DOI - PMC - PubMed
    1. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–5. 10.1038/ng.2653 - DOI - PMC - PubMed