Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May;593(7858):238-243.
doi: 10.1038/s41586-021-03446-x. Epub 2021 Apr 7.

Genome-wide enhancer maps link risk variants to disease genes

Affiliations

Genome-wide enhancer maps link risk variants to disease genes

Joseph Nasser et al. Nature. 2021 May.

Abstract

Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer-gene maps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.

PubMed Disclaimer

Conflict of interest statement

Competing interests: J.M.E., C.P.F., and E.S.L. are inventors on a patent application on CRISPR methods filed by the Broad Institute related to this work (16/337,846). Until recently, E.S.L. served on the Board of Directors for Codiak BioSciences and Neon Therapeutics; served on the Scientific Advisory Board of F-Prime Capital Partners and Third Rock Ventures; was affiliated with several non-profit organizations including serving on the Board of Directors of the Innocence Project, Count Me In, and Biden Cancer Initiative, and the Board of Trustees for the Parker Institute for Cancer Immunotherapy; and served on various federal advisory committees.

C.P.F. is now an employee of Bristol Myers Squibb. T.A.P. is now an employee of Boston Consulting Group. R.J.X. is a cofounder of Jnana Therapeutics and Celsius Therapeutics. M.J.D. is a founder of Maze Therapeutics. N.H. holds equity in BioNTech and consults for Related Therapeutics. All other authors declare no competing interests.

Figures

Extended Data Fig. 1.
Extended Data Fig. 1.. ABC maps connect fine-mapped variants to enhancers, genes, and cell types.
(a) Overview of approach. (b) ABC predictions connect two IBD GWAS signals to IL10. Signal tracks show DNase- or ATAC-seq (based on availability of data). Red arrows represent ABC predictions connecting variants to IL10. Dashed line shows transcription start site (TSS). Gray bars highlight fine-mapped variants that overlap ABC enhancers in at least one cell type. Credible set 1 contains two variants, both of which overlap enhancers predicted to regulate IL10 in various cell types. Credible set 2 contains four variants, one of which overlaps an enhancer predicted to regulate IL10 in monocytes stimulated with LPS.
Extended Data Fig 2.
Extended Data Fig 2.. Properties of ABC Predictions
(a) Cumulative fraction of the number of ABC enhancers within each biosample (median = 17,605). (b) Cumulative fraction of the number of enhancer-gene connections within each biosample (median = 48,441). (c) Cumulative fractions of the number of enhancers predicted to regulate each gene across all biosamples (black line, median = 2, mean = 2.8) and the mean number of enhancers predicted to regulate each gene within each biosample (red line, median = 2.8). (d) Cumulative fractions of the number of genes regulated by each ABC enhancer across all genes and all biosamples (black line, median = 1, mean = 2.7) and the mean number of genes regulated by each ABC enhancer within each biosample (red line, median = 2.7). (e) Cumulative fractions of the genomic distances between the enhancer and the gene for each predicted enhancer-gene connection across all genes and all biosamples (black line, median = 62,929bp) and the median genomic distance between each enhancer-gene connection within each biosample (red line, median = 62,782 bp). (f) Number of ABC enhancers predicted in 131 biosamples stratified by whether the epigenomic data for the biosample is derived from one or multiple donors. We do not observe significant differences between these distributions (two-sided Wilcoxon p-value = 0.10). Boxplot displays median, 25th and 75th percentiles. (g) Summary of ABC predictions in K562. Plot includes 122,410 non-promoter DHS elements in K562. Each element is classified as an ‘ABC Enhancer’ if the element is predicted to regulate at least one gene, or ‘Other Accessible Region’ otherwise. Horizontal axis represents distance from the element to the closest transcription start site (TSS) of an expressed gene. Vertical axis represents the percentile bin of the Activity of the element (in terms of DHS and H3K27ac signals) among these 122,410 elements. The coloring of the heatmap represents the fraction of elements in the corresponding distance and Activity bins that are ABC Enhancers.
Extended Data Fig. 3.
Extended Data Fig. 3.. Distinctness and Reproducibility of ABC predictions
(a) Distinctness of predictions across biosamples. Biosample vs. biosample (131 × 131) heatmap. The color of the (i,j) pixel in the heatmap represents the fraction of enhancer-gene connections (‘EG connections’ – which are defined to be an element-gene pair whose ABC Score is greater than 0.015) in biosample i that have a corresponding overlapping prediction in biosample j. Two connections are considered overlapping if the predicted genes are the same and the enhancer elements overlap. Rows and columns are ordered by hierarchical clustering. A median of 19% (median of row medians) of enhancer-gene connections are shared across distinct biosamples. (b) Distribution of shared connections by relatedness of samples. Distribution of the fraction of shared connections in (a) stratified by the relatedness of the samples. Each pair of biosamples is classified as: ‘Same Cell Line’ which indicates the same cell line under different perturbation conditions or from different compendia, ‘Same Primary Tissue Type’ which indicates the same tissue type from different compendia, ‘Same Lineage’ which indicates samples from the same lineage classification as in (a), Other refers to all other pairs of samples. (c) Quantitative reproducibility of ABC Predictions. ABC Scores computed using independent biological replicates of epigenomic data (ATAC-Seq and H3K27ac ChIP-Seq) from the BJAB cell line. Each data point is an element-gene pair. (d) Fraction of shared enhancer-gene connections between replicates increases as ABC Score cutoff increases. X-axis: Cutoff on the ABC Score. Y-axis: For a given cutoff of the ABC Score, the fraction of element-gene pairs with an ABC score greater than the cutoff in sample 1 that have an ABC score > 0.015 in sample 2. Each biosample is classified as: ‘Multiple Donors’, which indicates that the epigenetic data for this biosample is derived from different donors, or ‘Single Donor’, which indicates that the epigenetic data for this biosample is derived from the same donor or cell line. For ‘Single Donor’ biosamples, replicates represent independent epigenomic experiments from the same donor or cell line; for ‘Multiple Donor’ biosamples, replicates represent epigenomic experiments from different donors. Separate curves are computed for each biosample and then the average across biosamples is plotted. (e) Fraction of shared enhancer-gene connections increases as reproducibility of underlying epigenetic data increases. Each data point represents a biosample. X-axis: geometric mean of correlation of ATAC-Seq (or DNase-Seq) and H3K27ac ChIP-Seq signal in candidate regions computed using replicate epigenetic experiments. Y-axis: Fraction of EG connections with ABC Score > 0.015 in replicate 1 which also have ABC Score > 0.015 in replicate 2. Colors as in (d)
Extended Data Fig. 4.
Extended Data Fig. 4.. ABC performs well at identifying regulatory enhancer-gene connections in CRISPR datasets.
(a) Comparison of enhancer-gene predictors to experimental CRISPR data in K562 cells. Each of these predictors makes K562-specific predictions. Curves represent continuous predictors. Dots represent binary predictors as follows: (E) Each gene is predicted to be regulated only by the element closest to its transcription start site, (G) each element is predicted to regulate only the nearest (to TSS) expressed gene, (T) TargetFinder method, (L) elements and genes at opposite ends of HiCCUPS loops derived from Hi-C data are predicted as a connection, (D) an element-gene pair is a predicted positive if and only if the element and the gene are contained within the same contact domain. Red dot on ABC score curve: precision and recall achieved using a threshold on the ABC score of 0.015. Dashed black line: rate of experimental positives. (b) Comparison of ABC predictions using a binary distance threshold to experimental CRISPR data in K562 cells. “Activity (< X kb)” represents a model in which the score for an element-gene pair is the Activity of the element (in terms of DHS and H3K27ac signals) multiplied by a binary indicator (1 if the distance is < X Kb, and 0 otherwise). The ABC model using quantitative Hi-C outperforms the models based on binary thresholds indicating that Hi-C data is a critical component of the ABC model. (c) Comparison of ABC and other enhancer-gene predictors in full CRISPR dataset. Comparison of enhancer-gene predictors to experimental CRISPR data in K562, GM12878, NCCIT, BJAB (+/− stimulation), Jurkat (+/− stimulation), THP1 (+/− stimulation) cells and primary hepatocytes. For ABC, we used the predictions in the cell type corresponding to the CRISPR experiments. Because ABC is the only method that makes predictions in all of these cell types, we used this plot to compare ABC to other methods that make predictions without cell-type information. We consider each enhancer-gene pair predicted by these methods to be a prediction in all cell types. (d) Comparison of ABC and Ernst-Roadmap predictions. Comparison of enhancer-gene predictors to experimental CRISPR data in K562, GM12878, and unstimulated Jurkat, BJAB, THP1 cells. Red line represents comparison of ABC scores computed using epigenetic data from the same cell type as the CRISPR experiment was performed.To compare Roadmap predictions to CRISPR data, we made cell type substitutions because the Roadmap predictions did not include BJAB, Jurkat, and THP1 cells: for BJAB CRISPR data we compared to predictions in the Roadmap B cell sample (E032); for THP1 data we used the Roadmap monocyte sample (E124); and for Jurkat data we used the Roadmap T cell sample (E034). To directly compare the performance of ABC and Ernst-Roadmap methods in matched cell types, we also calculated ABC performance using the same cell type substitutions (green line) – for example CRISPR data in BJAB cells was compared to ABC Scores computed using epigenetic data from the Roadmap B cell sample (E032). (e) Comparison of ABC to Promoter-Capture Hi-C. Comparison of enhancer-gene predictors to experimental CRISPR data in K562 and unstimulated BJAB, THP1 and Jurkat cells. Red line represents comparison of ABC Scores computed using epigenetic data from the same cell type as the CRISPR experiment was performed. To compare promoter-capture Hi-C CHiCAGO predictions (purple line) to CRISPR data, we made cell type substitutions because PC-HiC data is not available in K562, BJAB, Jurkat, and THP1 cells: for K562 CRISPR data we compared to CHiCAGO scores in erythroblasts; for BJAB CRISPR data we compared to total B cells; for THP1 data we compared to monocytes; and for Jurkat data we compared to total CD4+ T cells. To directly compare the performance of ABC and PC-HiC methods in matched cell types, we also calculated ABC performance using the same cell type substitutions (green lines). The solid green line represents ABC scores where the contact component is derived from the average Hi-C dataset used throughout this study. The dashed green line represents ABC scores where the contact component is derived from the raw counts in PC-HiC experiments (see Methods). (f-h) Comparison of ABC to Promoter-Capture Hi-C Stratified by distance. These panels represent the comparison of the same predictors as in (e) while stratifying the experimental dataset in (e) based on the distance between the tested element and gene transcription start site. Of the 4078 element-gene connections in the experimental dataset, 398 are at a distance of <50kb (of which 94 are experimental positives, 24% positive rate), 1102 are between 50kb and 200kb (20 positives, 2% positive rate), and 2578 are at a distance of >200kb (10 positives, 0.4% positive rate). Given the differences in positive rates between the stratifications (indicated by dashed black lines), it is appropriate to compare PR curves within each stratification, but it is not appropriate to compare the PR curve of the same predictor across stratifications.
Extended Data Fig. 5.
Extended Data Fig. 5.. Fine-mapped GWAS variants are highly enriched in ABC enhancers.
(a) Number of credible sets analyzed for 72 diseases and complex traits. Light gray shows total number of fine-mapped credible sets. Dark gray shows number of such credible sets with no coding or splice site variants, and at least one variant with PIP >= 10%. Red shows number of credible sets for which ABC-Max makes a prediction (i.e., a variant with PIP >= 10% overlaps an ABC enhancer in a biosample that shows global enrichment for that trait). See Supplementary Table 7 for trait descriptions and additional statistics. (b) Enrichment of fine-mapped variants (PIP >= 10%) associated with 4 blood cell traits in ABC enhancers in the corresponding blood cell types or progenitors. Enrichment = (fraction of fine-mapped variants / fraction of all common variants) overlapping regions in each cell type. Numbers of biosamples in each category are shown in parentheses. (c) Enrichment of fine-mapped IBD variants (PIP >= 10%) in ABC enhancers and other sets of previously defined enhancers. Cumulative density function shows distribution across cell types. (d) Enrichment of fine-mapped variants (PIP >= 10%) in ABC enhancers resized in different ways. Regions of at least 500-bp (blue line) are used to count reads, as defined previously. Regions were then shrunk by 150-bp on each side (minimum size of element = 200 bp) for overlapping with variants. Gray lines show alternative sizes, which do not appear to notably affect enrichments of fine-mapped variants. (e) % of noncoding variants across all traits that overlap an ABC enhancer in an enriched biosample, as a function of the number of cell types analyzed. Biosamples (131) were grouped into 74 cell types/tissues; and analyzed in random order. Black line: mean across 20 random orderings. Dashed gray lines: 95% confidence intervals. (f) Fraction of variants or heritability for all 72 traits contained in different categories of genomic regions: coding sequences (CDS), untranslated regions (UTR), splice sites (within 10 bp of an intron-exon junction of a protein-coding gene), promoters (±250 bp from the gene TSS), ABC enhancers in 131 biosamples, other accessible regions not called as ABC enhancers, and other intronic or intergenic regions. In cases where a variant overlaps more than one category, the variant was assigned to the first category that it overlapped (i.e., variants in CDS were not also counted in the ABC category, Methods). Left: All common variants or heritability (h2, as estimated by S-LDSC in inverse-variance weighted meta-analysis across 74 traits). Right: Fraction of variants above a threshold on the fine-mapping PIP.
Extended Data Fig. 6.
Extended Data Fig. 6.. ABC enhancer maps connect GWAS variants to known genes.
(a) ABC predictions for IBD credible sets linked to IL10. Heatmap shows ABC scores for each gene within 1 Mb in selected primary immune cell types. Credible Set 1 is linked by ABC to multiple genes, but IL10 (red) has the strongest ABC score in any cell type. (b) Cumulative density plot showing enrichment for gene sets in MSigDB among the genes prioritized by each method. Methods are colored and categories as in Fig. 1c. For each method, we first identified the top 5 most enriched significant gene sets in the predictions of that method (82 gene sets total). Then, we calculated the levels of enrichment of all 82 gene sets in the predictions of each method. (c) Comparison of predictions for the 37 IBD credible sets near known genes. Fraction predictions shared = (# credible sets where both methods predict the same gene) / (# credible sets where both methods make a prediction). For example, 16 credible sets have predictions from both ABC-Max and ChromHMM-RNA correlation, and the two methods predict the same gene in 14 of 16 credible sets. (d) Enrichment of likely causal genes for 10 blood traits (defined by common coding variants) for various prediction methods. Enrichment reflects the number of correctly predicted genes identified divided by the baseline of choosing random genes in each of the loci with a prediction. (e) Precision-recall plot for identifying known IBD genes, comparing additional variations on the prediction methods (related to Fig. 1c). For ABC, we compared ABC-Max (assigning each credible set to the gene with the maximum ABC score, red circle), ABC-Max excluding all immune and gut tissue biosamples (orange circle), and ABC-All (assigning each credible set to all genes linked to enhancers, red triangle). For other methods that provided quantitative scores, we similarly compared choosing the gene with the best score per locus (circles) with choosing all genes above the global thresholds previously reported in each study (triangles). In most cases, the best gene per locus outperformed using a global threshold.
Extended Data Fig. 7.
Extended Data Fig. 7.. ABC-Max predictions at LRRC32 and RASL11A loci.
ABC-Max predictions and chromatin state in primary immune cells and fetal colon tissue at 2 IBD loci: (a) LRRC32 and (b) RASL11A. Red marks variants, enhancer-gene connections, and target genes identified by ABC-Max. Gray bars highlight the variants overlapping ABC enhancers. Vertical dotted lines represent TSSs. “DCs +LPS”: dendritic cells stimulated with bacterial lipopolysaccharide for 4 hours.
Extended Data Fig. 8.
Extended Data Fig. 8.. Cell-type specificity of ABC predictions.
(a) A comparison of the number of biosample groups (cell type lineages) in which the gene promoter is active versus the number of categories in which a variant is predicted to regulate the gene by ABC-Max. (b) Heatmap of ABC scores for predicted IBD genes in resting and stimulated mononuclear phagocytes (from epigenomic data in monocytes and dendritic cells). IRF4 and PDGFB (bold) are two examples where ABC predictions are specific to a particular stimulated state (+LPS) and are not observed in unstimulated states. (c) Enrichment for top gene sets identified when performing enrichment analysis among the 23 IBD genes predicted by ABC-Max in mononuclear phagocytes (MNPs, dark gray), versus when performing the same analysis among the 43 IBD genes predicted in any biosample (light gray). The enrichment for a given gene is calculated as the ratio of the frequency at which ABC-predicted genes belong to the gene set, compared to the frequency at which all genes within 1 Mb of these loci belong to the gene set (Methods). (d) A variant in an intron of ANKRD55 is predicted by the ABC Model to regulate IL6ST in thymus. Gray bar highlights the variant overlapping the predicted ABC enhancer. Vertical dotted lines represent TSSs. Red arc at top denotes ABC-Max prediction. Red arc at bottom denotes that CRISPRi of the highlighted enhancer significantly affects the expression of IL6ST only in Jurkat cells.
Extended Data Fig. 9.
Extended Data Fig. 9.. Genes linked by ABC to different traits via different variants.
(a) ABC links IKZF1 to 2 traits via variants in 18 credible sets. Red boxes mark enhancers predicted to regulate IKZF1. Thick black line marks the IKZF1 TSS. Black dots mark fine-mapped noncoding variants (PIP >= 10%) associated with one or more traits linked to IKZF1 by ABC-Max. (b) Genes linked to different traits via different variants have more complex enhancer landscapes. Cumulative distribution plots show the (left) number of ABC enhancer-gene connections in all 131 biosamples, and (right) the distance between the TSSs of the two closest neighboring genes on either side of a gene, for each gene linked by ABC-Max to zero traits, one trait, or two or more traits through different variants. (c) The complexity of a gene’s enhancer landscape is correlated with the odds of the gene being linked to multiple GWAS traits. X-axis shows the Wald odds ratio that a gene is connected to multiple GWAS traits, comparing genes in the top decile versus all other deciles of the corresponding enhancer complexity metric. The 3 enhancer complexity metrics are defined for each gene: the total number of enhancers linked to the gene by ABC in any biosample, the number of enhancers linked to a gene per biosample in which the gene’s promoter is active, and the genomic distance to the closest neighboring TSS on either side of the gene. Dot: mean of top decile genes (n = 1,838) versus all others (n = 16,550). Whiskers: 95% CI.
Extended Data Fig. 10.
Extended Data Fig. 10.. Enhancers and variants connected to PPIF.
(a) ABC predictions for variants near PPIF. Black dots represent either (i) fine-mapped variants (PIP >= 10%) for IBD and UK Biobank traits, or (ii) lead variants for any phenotype from the GWAS Catalog (the latter to show the approximate locations of signals for traits for which fine-mapping is not yet available). “IBD” label points to rs1250566. “MS” (multiple sclerosis) label points to rs1250568 (fine-mapped in). Red boxes mark enhancers predicted to regulate PPIF. Thick back lines mark TSSs. Thin black lines mark selected variants. (b) CRISPRi-FlowFISH data for PPIF in 7 immune cell lines and stimulated states. Red boxes mark distal enhancers (CRISPR gRNAs lead to a significant decrease in the expression of PPIF). Dark gray box marks the gene body of PPIF, where CRISPRi cannot accurately assess the effects of putative regulatory elements. (c) Chromatin accessibility in 5-kb regions around the PPIF enhancer (e-PPIF). Signal tracks show ATAC-seq (for THP1 and BJAB) or DNase-seq (for GM12878 and Jurkat) data in reads per million. Arrows show locations of variants associated with MS and lymphocyte count (Lym, rs1250568) and with IBD (rs1250566), which overlap with enhancers that regulate PPIF in distinct sets of cell types. (d) Effect of each tested gRNA on PPIF expression, as measured by CRISPRi-FlowFISH (Methods). Dots: gRNAs whose effect estimate is >0% (black) or <0% (red). Red bars show regions where gRNAs have a significant effect on gene expression (FDR < 0.05), as compared by a two-sided t-test to negative control gRNAs. (e) Effects of 8 individual gRNAs on PPIF expression in THP1 cells, as measured by CRISPRi and qPCR (Methods). PPIF expression is normalized to expression of GAPDH and to cells expressing negative control, non-targeting gRNAs (Ctrl). Error bars: 95% confidence intervals of the mean (n = 6 replicates per gRNA). (f) Schema of pooled CRISPRi screen to examine the effects of PPIF and e-PPIF on mitochondrial membrane potential (Δψm). Cells expressing a pool of gRNAs were stained with MitoTracker Red and MitoTracker Green and sorted into 3 bins of increasing Red:Green ratios. gRNAs from cells in each bin were PCR-amplified, sequenced, and counted. (g) Effects of CRISPRi gRNAs (targeting e-PPIF, PPIF promoter, or negative controls (Ctrl)) on Δψm, quantified as the frequency of THP1 cells carrying those gRNAs with low or medium versus high MitoTracker Red signal (corresponding to Bins 1, 2, and 3, respectively; superset of data in Fig. 5d). We tested THP1 cells in unstimulated conditions, stimulated with LPS, and differentiated with PMA and stimulated with LPS (Methods). Error bars: 95% confidence intervals for the mean of 40, 9, and 5 gRNAs for Ctrl, PPIF, and e-PPIF, respectively. Two-sided rank-sum P = 0.0163 (*), 0.00426 (**), or 0.000356 (***) versus Ctrl. (h) Ratios of MitoTracker Red (mitochondrial membrane potential) to MitoTracker Green (mitochondrial mass) signal in THP1 cells at baseline, stimulated with LPS, and differentiated into macrophages with PMA and stimulated with LPS in biological duplicate (from left to right, n = 8044, 99683, 99982, 99968, 99886, and 99878; replicates were cultured, stimulated, stained, and flow sorted independently). Box represents median and interquartile range; whiskers show minimum and maximum. Stimulation with either LPS alone or both PMA and LPS leads to a reduction in red:green signal, indicating a reduction in mitochondrial membrane potential normalized to mitochondrial mass.
Figure 1.
Figure 1.. ABC maps connect fine-mapped variants to enhancers, genes, and cell types.
(a) Enrichment of fine-mapped IBD variants (PIP >= 10%) in ABC enhancers (left) and all other accessible regions (right) in each of 131 biosamples. MNPs: mononuclear phagocytes. Box: median and interquartile range. Whiskers: observation less than or equal to quartile +/− 1.5 * IQR. (b) Fraction of noncoding variants above a given PIP threshold that overlap an ABC enhancer in any biosample. Black line: weighted average across 72 traits. Traces are shown for PIP thresholds above which there are at least 5 variants. Dashed line: fraction of all common noncoding variants that overlap ABC enhancers. (c) Precision-recall for connecting noncoding IBD credible sets to known IBD genes, considering 37 credible sets with 1 known gene within 1 Mb (Methods). Precision: fraction of identified genes corresponding to known genes. Recall: fraction of the 37 known genes identified. Where quantitative scores were available (e.g., colocalization probability), plot presents the performance of choosing the gene with the best score per locus (see also Extended Data Fig. 6b).
Figure 2.
Figure 2.. Connecting variants to target genes
(a) Histograms of the (left) distances from the predicted variant to the TSS of the ABC-Max target gene and (right) distance rank of the gene in the locus. Data includes predictions for all 72 traits. (b) ABC-Max predictions for 47 noncoding IBD credible sets linking to 43 unique genes (4 genes are linked to 2 sets each). Heatmap: ABC scores in 6 biosample categories (maximum value within each category). Red scale: ABC score. Blue scale: log10 genomic distance from variant to gene TSS. Black boxes indicate that the gene is the closest to the lead SNP, was implicated in IBD risk based on coding variation or experimental evidence about gene function, was identified by prior eQTL colocalization or TWAS analyses, or is in an enriched gene set (Methods). (c) ABC-Max predictions and chromatin state at the PDGFB locus. Red color denotes variants, enhancer-gene connections, and target genes identified by ABC-Max. Gray bars: variants in two credible sets overlap ABC enhancers. Vertical dotted lines: TSSs.
Figure 3.
Figure 3.. Cell-type specificity of ABC predictions
(a) Histogram of the number of biosamples in which (red) a variant-gene connection is predicted by ABC-Max (i.e., an ABC enhancer regulates the target gene in a given biosample) and (gray) the promoter of the targeted gene is active (Methods). (b) Histogram of the number of GWAS signals per gene (unique credible sets with no overlapping variants with PIP >= 10%, Methods). Model at top depicts a gene linked to different traits via different variants. Circles: enhancers. Black arrows: gene. Colored arrows: ABC predictions. Triangles: variants. (c) Number of predicted enhancer-gene connections (per biosample in which the promoter of a gene is active), for genes linked by ABC-Max to zero traits, one trait by one or more variants, or two or more traits via different variants. Labels: two genes described in text.
Figure 4.
Figure 4.. An enhancer regulates PPIF expression and mitochondrial function.
(a) An IBD risk variant (rs1250566) overlaps an enhancer predicted to regulate PPIF. Signal tracks: ATAC-seq or DNase-seq. Gray bar: enhancer containing rs1250566. Dashed lines: TSSs. Red arcs at top: ABC-Max predictions. Red arcs at bottom: CRISPRi leads to a significant decrease in PPIF expression. (b) 1224-bp region at the PPIF enhancer (e-PPIF). Accessibility: DNase- or ATAC-seq from primary immune cells (DCs=dendritic cells, Mo=monocytes). Conservation: phastCons 100-mammal alignment. Red bar: region targeted with CRISPRi gRNAs. (c) Effects of CRISPRi at e-PPIF on the expression of PPIF in immune cell lines in resting and stimulated (stim) conditions. Error bars show 95% confidence intervals of the mean. *: two-sided t-test PBenjamini-Hochberg < 0.05 for 164 CRISPRi gRNAs targeting e-PPIF compared to 814 negative control (Ctrl) gRNAs (adjusted P values from left to right: 4.68 × 10−101, 4.86 × 10−112, 0.019, 0.044, 1.48 × 10−71). (d) Effects of CRISPRi gRNAs (targeting e-PPIF, PPIF promoter, or negative controls (Ctrl)) on Δψm, quantified as the frequency of THP1 cells carrying those gRNAs with low versus high MitoTracker Red signal (see Extended Data Fig. 10f–h). We tested THP1 cells in unstimulated conditions, stimulated with LPS, and differentiated with phorbol 12-myristate 13-acetate (PMA) and stimulated with LPS (Methods). Error bars: 95% confidence intervals for the mean of 40, 9, and 5 gRNAs for Ctrl, PPIF, and e-PPIF, respectively. Two-sided rank-sum P = 0.0163 (*), 0.00426 (**), or 0.000356 (***) versus Ctrl.

References

    1. Claussnitzer M et al. A brief history of human disease genetics. Nature 577, 179–189, doi:10.1038/s41586-019-1879-7 (2020). - DOI - PMC - PubMed
    1. Farh KK-H et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343, doi:10.1038/nature13835 (2015). - DOI - PMC - PubMed
    1. Maurano MT et al. Systematic localization of common disease-associated variation in regulatory DNA. Science (New York, N.Y.) 337, 1190–1195, doi:10.1126/science.1222794 (2012). - DOI - PMC - PubMed
    1. Fulco CP et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat Genet 51, 1664–1669, doi:10.1038/s41588-019-0538-0 (2019). - DOI - PMC - PubMed
    1. Westra H-J & Franke L From genome to function by studying eQTLs. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease 1842, 1896–1902, doi:10.1016/j.bbadis.2014.04.024 (2014). - DOI - PubMed

Extended Data References

    1. Buenrostro JD, Wu B, Chang HY & Greenleaf WJ ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Curr Protoc Mol Biol 109, 21 29 21–21 29 29, doi:10.1002/0471142727.mb2129s109 (2015). - DOI - PMC - PubMed
    1. Zhu J et al. Genome-wide chromatin state transitions associated with developmental and environmental cues. Cell 152, 642–654, doi:10.1016/j.cell.2012.12.033 (2013). - DOI - PMC - PubMed
    1. Consortium EP An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74, doi:10.1038/nature11247 (2012). - DOI - PMC - PubMed
    1. Roadmap Epigenomics C et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330, doi:10.1038/nature14248 (2015). - DOI - PMC - PubMed
    1. Li H & Durbin R Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 25, 1754–1760, doi:10.1093/bioinformatics/btp324 (2009). - DOI - PMC - PubMed

Publication types