Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(1):e54359.
doi: 10.1371/journal.pone.0054359. Epub 2013 Jan 30.

Predicting cell types and genetic variations contributing to disease by combining GWAS and epigenetic data

Affiliations

Predicting cell types and genetic variations contributing to disease by combining GWAS and epigenetic data

Anna Gerasimova et al. PLoS One. 2013.

Abstract

Genome-wide association studies (GWASs) identify single nucleotide polymorphisms (SNPs) that are enriched in individuals suffering from a given disease. Most disease-associated SNPs fall into non-coding regions, so that it is not straightforward to infer phenotype or function; moreover, many SNPs are in tight genetic linkage, so that a SNP identified as associated with a particular disease may not itself be causal, but rather signify the presence of a linked SNP that is functionally relevant to disease pathogenesis. Here, we present an analysis method that takes advantage of the recent rapid accumulation of epigenomics data to address these problems for some SNPs. Using asthma as a prototypic example; we show that non-coding disease-associated SNPs are enriched in genomic regions that function as regulators of transcription, such as enhancers and promoters. Identifying enhancers based on the presence of the histone modification marks such as H3K4me1 in different cell types, we show that the location of enhancers is highly cell-type specific. We use these findings to predict which SNPs are likely to be directly contributing to disease based on their presence in regulatory regions, and in which cell types their effect is expected to be detectable. Moreover, we can also predict which cell types contribute to a disease based on overlap of the disease-associated SNPs with the locations of enhancers present in a given cell type. Finally, we suggest that it will be possible to re-analyze GWAS studies with much higher power by limiting the SNPs considered to those in coding or regulatory regions of cell types relevant to a given disease.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distribution of the SNPs in coding, 5′-UTR, 3′-UTR, introns and intergenic regions.
Three sets of SNPs are shown. SNPs identified as being significantly associated with asthma according to the GWAS Integrator database are shown on the left. The middle shows the same set of SNPs extended by those that are in tight genetic linkage (r2 = 0.8) according to HaploReg . On the right, the distribution of common SNPs that are not associated with asthma is shown. The distribution of SNPs into coding, 5′-UTR, 3′-UTR, introns, and intergenic regions was done using RefSeq datasets from the UCSC Genome Browser , .
Figure 2
Figure 2. Asthma-associated SNPs and H3K4me1-enriched regions (enhancers) in the human Th2 cytokine locus of different cells and tissue types.
From top to bottom, using the UCSC genome browser, are displayed: the conserved DNAse hypersensitivity regions identified in mouse T cells (HS regions), the gene track (genes), all the SNPs not associated with asthma, the SNPs associated with asthma, the species conservation track, the H3K4me1 ChIP-seq track (green) for the different cell and tissue types (named on the left) underlined by corresponding peak calling track (black boxes). For the blood CD4+ T cells, peak calling tracks from seven samples/cell-types are displayed. The red boxes show H3K4me1 peaks that are present only in CD4+ T cells (LCRO and HSV).
Figure 3
Figure 3. The location of enhancers is cell-type specific.
The plot depicts pairwise comparisons of the location of enhancers in different datasets using Matthew Correlation Coefficients (MCC). Black indicates a high correlation between enhancers in two cell types. The 37 studied datasets form distinct clusters that correspond to different cell- or tissue types.
Figure 4
Figure 4. Distribution of enhancers in asthma-associated SNPs for different cell types.
For each SNP and cell type, a black bar indicates that an enhancer is overlapping the SNP in that cell type. Cell types are ordered by their enrichment for asthma-associated SNPs in enhancers from breast tissue with low enrichment at the top to CD4+ T cells with high enrichment at the bottom. SNPs are ordered by how commonly they overlap with enhancers in different cell types from those with enhancers present in all 8 cell types on the left to with enhancers in just 1 cell type on the right. Asthma-associated SNPs with no enhancer in any cell type are left out from the graph.
Figure 5
Figure 5. Distribution of enhancers in asthma-associated SNPs for different cell types.
Plotted is the enrichment of asthma-associated SNPs compared to background SNPs in genomic regions in which there are CD4+ T enhancers, and anywhere from –0 to 7 additional cell types that also have a peak in that region.
Figure 6
Figure 6. Distribution of the asthma-associated SNPs in TFBSs.
Two sets of asthma-associated SNPs are shown. Asthma-associated SNPs that belong to any of H3K4me1 peaks called in this study are shown on the left. On the right, the distribution of asthma-associated SNPs that are located in any of H3K4me1 peaks. The distribution of SNPs into overlapping and non-overlapping TFBSs was done using TFBSs by ChIP-seq dataset from the ENCODE (Release 2) . The TFBSs ChIP-seq data were obtained from UCSC Genome Browser , .
Figure 7
Figure 7. Asthma-associated SNPs and H3K4me1 (enhancer) enriched regions in the human IL-33R locus of different cell/tissue types.
From top to bottom, using the UCSC genome browser, are displayed: the gene track (genes), all the SNPs not associated with asthma, the SNPs associated with asthma (red are GWAS-identified SNPs, blue are SNPs in linkage disequilibrium), H3K4me1 ChIP-seq track (green) for different cell/tissue types (named on the left) underlined by the corresponding peak-calling track (black boxes). For the blood CD4+ T cells, peak calling tracks from seven samples/cell-types are displayed. The red box shows an H3K4me1 peak that is present only in CD4+ T cells.

Similar articles

Cited by

References

    1. Granada M, Wilk JB, Tuzova M, Strachan DP, Weidinger S, et al... (2012) A genome-wide association study of plasma total IgE concentrations in the Framingham Heart Study. J Allergy Clin Immunol 129: 840–845 e821. - PMC - PubMed
    1. Mondul AM, Yu K, Wheeler W, Zhang H, Weinstein SJ, et al. (2011) Genome-wide association study of circulating retinol levels. Hum Mol Genet 20: 4724–4731. - PMC - PubMed
    1. Hirota T, Takahashi A, Kubo M, Tsunoda T, Tomita K, et al. (2011) Genome-wide association study identifies three new susceptibility loci for adult asthma in the Japanese population. Nat Genet 43: 893–896. - PMC - PubMed
    1. Noguchi E, Sakamoto H, Hirota T, Ochiai K, Imoto Y, et al. (2011) Genome-wide association study identifies HLA-DP as a susceptibility gene for pediatric asthma in Asian populations. PLoS Genet 7: e1002170. - PMC - PubMed
    1. Ege MJ, Strachan DP, Cookson WO, Moffatt MF, Gut I, et al... (2011) Gene-environment interaction for childhood asthma and exposure to farming in Central Europe. J Allergy Clin Immunol 127: 138–144, 144 e131–134. - PubMed

Publication types