Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 15;17(8):2137-2150.
doi: 10.1016/j.celrep.2016.10.059.

eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data

Affiliations

eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data

Charles E Breeze et al. Cell Rep. .

Abstract

Epigenome-wide association studies (EWAS) provide an alternative approach for studying human disease through consideration of non-genetic variants such as altered DNA methylation. To advance the complex interpretation of EWAS, we developed eFORGE (http://eforge.cs.ucl.ac.uk/), a new standalone and web-based tool for the analysis and interpretation of EWAS data. eFORGE determines the cell type-specific regulatory component of a set of EWAS-identified differentially methylated positions. This is achieved by detecting enrichment of overlap with DNase I hypersensitive sites across 454 samples (tissues, primary cell types, and cell lines) from the ENCODE, Roadmap Epigenomics, and BLUEPRINT projects. Application of eFORGE to 20 publicly available EWAS datasets identified disease-relevant cell types for several common diseases, a stem cell-like signature in cancer, and demonstrated the ability to detect cell-composition effects for EWAS performed on heterogeneous tissues. Our approach bridges the gap between large-scale epigenomics data and EWAS-derived target selection to yield insight into disease etiology.

Keywords: DNase I hypersensitive sites; bioinformatics; epigenetics; epigenome-wide association study; histone marks.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
eFORGE Overview and Performance (A) Concept, components, and flowchart of eFORGE: upper-left panel depicts typical EWAS results with top hits marked as large black dots that serve as input for eFORGE. The main components of eFORGE are controlled by Perl software that uses data from the Roadmap Epigenomics, ENCODE, and BLUEPRINT projects to compute enrichment and significance profiles (illustrated by middle and bottom left panels). R code is used to generate output graphs (illustrated by bottom right panel) with predicted target cell types marked in red. (B) Reproducibility: using the CD14+ tDMP dataset (Jaffe and Irizarry, 2014), 1,000 different runs were performed showing that the variability due to random background sampling is well below the two eFORGE thresholds (green and red lines) that affect target prediction (shown in log scale). These data indicate high reproducibility between eFORGE runs. (C) Runtime: comparison of Perl BigFloat and BigInt (original code, in black) versus logarithm-based code (in blue) for the management of decimal p value numbers shows up to a 15-fold increase in speed for logarithm-based code. Original code was unable to process 1,000 probes, so data are only shown for probe sets under 1,000 probes. (D) GeEC correlation data matrix for DHS/Histone mark data from the Roadmap, ENCODE, and BLUEPRINT projects. Red regions show high positive correlation (as measured by Pearson correlation coefficient), white regions show no correlation and blue regions show high negative correlation. Grouping of data by hierarchical clustering agrees with original DHS/Histone mark label (y axis), suggesting a similarity in measurements between different consortia. See also Tables S1–S5, S6 and Data S1.
Figure 2
Figure 2
eFORGE Analysis of tDMPs and cDMPs Results show ability to predict target tissues from known tissue-specific differentially methylated positions (tDMPs) and cell type-specific DMPs (cDMPs): the heatmap is a composite of results for the top 1,000 tDMPs for blood, kidney, and lung (Lowe et al., 2015), and top cDMPs for CD14+, T cells, and NK cells (Jaffe and Irizarry, 2014). With tDMPs and cDMPs, we have the advantage of a known prior tissue- or cell type-specific association. We can thus test whether the eFORGE tool identifies the correct tissue. The color-coded enrichment results show that eFORGE identified the correct tissue or cell type each time, with no false-positives. This confirms the tool can signal when regions are associated by DNAm with a specific cell type. See also Figure S1 and Data S2, S3, S4, and S5.
Figure 3
Figure 3
Aggregated Enrichment Statistics for Studies with eFORGE Signal from a Recent Review Studies were obtained from the review by Michels et al. (2013). This heatmap shows the enrichment statistics (presented as –log10(binomial p value)) for an unbiased selection of EWAS (n = 20 studies, each with at least 100 samples). Many of these studies show an enrichment pattern specific to particular tissues, such as blood (blue box, seven studies) and stem cells (red box, five studies). In addition, one ccRCC study shows a kidney specific enrichment and one CLL study presents a lung-specific enrichment (lung tissue and IMR90). Other patterns are more mixed (yellow box, six studies). Of the seven blood-enriched studies, six were performed in blood and one was performed in breast cancer tissue, which may contain immune cells. All five studies that show a stem cell-specific enrichment are exclusively cancer or aging EWAS. Of the six studies that show a mixed enrichment, there is evidence of different components underlying variation. For example, the EWAS on child maltreatment performed on salivary DNA, despite showing enrichment for many tissues, has blood cell types as the highest categories. Work remains to be done to refine these mixed signals and define the components that are driving enrichment for several different tissue types. See also Table S7.
Figure 4
Figure 4
Karyotype View of EWAS Hits and Bar Chart of EWAS Tissues (A) This karyotype view was obtained taking top ten study hits from each of the 20 EWAS with eFORGE signal (taken from Michels et al., 2013) and performed using ensembl KaryoView (http://www.ensembl.org/Homo_sapiens/Location/Genome). Many EWAS exclude probes from sex chromosomes as part of study analysis, and therefore there is an absence of top hits in these chromosomes on the graph. Apart from this, there seems to be no strong bias in the distribution of EWAS hits along the genome. (B) Bar chart indicating analyzed tissue for 20 EWAS with eFORGE signal from Michels et al. (2013). As is to be expected for an easily accessible tissue, blood is the most analyzed category, with ten studies. See also Table S7.
Figure 5
Figure 5
eFORGE Analysis of Autoimmune EWAS Top panel shows a blood (predominantly T cell), intestine, and thymus-specific signal for 86 probes from an EWAS on SLE. Middle panel shows a more general pattern of enrichment, with a strong blood signal, with CD14+ cells as the highest category, for a set of 100 RA EWAS probes. Bottom panel shows a blood (predominantly T cell) and thymus-specific enrichment for a set of 753 probes for an EWAS on Sjögren’s syndrome. Probe lists were obtained from the supplementary files of the studies (Coit et al., 2013, Liu et al., 2013, Altorok et al., 2014).
Figure 6
Figure 6
eFORGE Analysis of Surrogate Tissue and Multiple Sclerosis EWAS (A) DHS analysis of multiple sclerosis EWAS. Upper panel shows eFORGE blood, spleen, and thymus enrichment in Roadmap Epigenomics data for top 1,000 hypomethylated DMPs (ranked in the study by likelihood ratio test and Fisher’s method FDR q value). Lower panel shows enrichment for macrophages and monocytes in an analysis of the same regions with BLUEPRINT data. (B) Histone mark analysis of multiple sclerosis EWAS. Panel shows enrichment for top 1,000 study hypomethylated DMRs. Cell type-specific scores are colored by FDR q value. Cell types with q values below 0.01 for histone modifications representative of enhancers (H3K4me1) are shown in red, promoters (H3K4me3) are shown in purple, and polycomb-repressed regions (H3K27me3) are shown in green. Cell types with q values between 0.01 and 0.05 for the histone modification representative of promoters (H3K4me3) are shown in light purple. Cell types with q values above 0.01 for all other histone modifications are shown in blue. H3K36me3 (transcribed regions) and H3K9me3 (a marker for heterochromatin) did not present any significant cell type-specific enrichment patterns. Analyzed regions show enrichment for H3K4me1 (and, at a lower level, H3K4me3) in blood cells. (C) Analysis of surrogate tissue EWAS: the three panels (ENCODE, BLUEPRINT, and consolidated Roadmap, from top to bottom) show enrichment for monocyte, macrophage, and AML for an ovarian cancer prediction EWAS measured on whole blood. There is no enrichment for any other tissue (including lymphoid cells, ovarian tissue, and, interestingly, megakaryocytes). This supports a myeloid-lineage-specific DHS enrichment for top regions from this EWAS. By discarding enrichment in megakaryocyte regions, and showing enrichment for acute promyelocytic leukemia cell lines (NB4 and HL-60), the lineage-specific component of this tissue-specific signal points to a divergence that occurs after differentiation from the common myeloid progenitor and is suggestive of an event during myeloblastic differentiation. This DHS enrichment pattern extends to the myeloblast branch of the myeloid lineage, pointing to these regions being active in the myeloblast, which would be the cell of origin of this tissue-specific signal. This enrichment pattern shows cell types that drive the proposed myeloid/lymphoid imbalance causing the methylation signal observed (Teschendorff et al., 2009, Houseman et al., 2012, Li et al., 2014).
Figure 7
Figure 7
Analysis of Cancer EWAS This heatmap shows a stem cell-like signature for regions from five cancer EWAS, through color-coded enrichment –log10(q value). The left column depicts results for 330 top probes from a breast cancer metastatic behavior EWAS (Fang et al., 2011), the second column from the left shows results for 450 probes from a colorectal carcinoma EWAS (Kibriya et al., 2011), and the central column shows results for 240 probes from a sporadic colorectal cancer EWAS (Laczmanska et al., 2013). The next column on the right shows results for 801 probes from an adrenocortical carcinoma EWAS (Barreau et al., 2013), and the last column on the right shows results for 362 probes from a clear cell renal cell carcinoma EWAS (Arai et al., 2012). All five studies showed intermediate enrichment (q value <0.05) of at least one eFORGE “ES cell” or “iPSC” category. Aside from this stem cell-like signature, no other shared tissue category is enriched across the five probe lists.

References

    1. Adams D., Altucci L., Antonarakis S.E., Ballesteros J., Beck S., Bird A., Bock C., Boehm B., Campo E., Caricasole A. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 2012;30:224–226. - PubMed
    1. Altorok N., Coit P., Hughes T., Koelsch K.A., Stone D.U., Rasmussen A., Radfar L., Scofield R.H., Sivils K.L., Farris A.D., Sawalha A.H. Genome-wide DNA methylation patterns in naive CD4+ T cells from patients with primary Sjögren’s syndrome. Arthritis Rheumatol. 2014;66:731–739. - PMC - PubMed
    1. Arai E., Chiku S., Mori T., Gotoh M., Nakagawa T., Fujimoto H., Kanai Y. Single-CpG-resolution methylome analysis identifies clinicopathologically aggressive CpG island methylator phenotype clear cell renal cell carcinomas. Carcinogenesis. 2012;33:1487–1493. - PMC - PubMed
    1. Aryee M.J., Jaffe A.E., Corrada-Bravo H., Ladd-Acosta C., Feinberg A.P., Hansen K.D., Irizarry R.A. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30:1363–1369. - PMC - PubMed
    1. Barreau O., Assié G., Wilmot-Roussel H., Ragazzon B., Baudry C., Perlemoine K., René-Corail F., Bertagna X., Dousset B., Hamzaoui N. Identification of a CpG island methylator phenotype in adrenocortical carcinomas. J. Clin. Endocrinol. Metab. 2013;98:E174–E184. - PubMed

Publication types