Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct;55(10):1665-1676.
doi: 10.1038/s41588-023-01509-5. Epub 2023 Sep 28.

Multitissue H3K27ac profiling of GTEx samples links epigenomic variation to disease

Affiliations

Multitissue H3K27ac profiling of GTEx samples links epigenomic variation to disease

Lei Hou et al. Nat Genet. 2023 Oct.

Abstract

Genetic variants associated with complex traits are primarily noncoding, and their effects on gene-regulatory activity remain largely uncharacterized. To address this, we profile epigenomic variation of histone mark H3K27ac across 387 brain, heart, muscle and lung samples from Genotype-Tissue Expression (GTEx). We annotate 282 k active regulatory elements (AREs) with tissue-specific activity patterns. We identify 2,436 sex-biased AREs and 5,397 genetically influenced AREs associated with 130 k genetic variants (haQTLs) across tissues. We integrate genetic and epigenomic variation to provide mechanistic insights for disease-associated loci from 55 genome-wide association studies (GWAS), by revealing candidate tissues of action, driver SNPs and impacted AREs. Lastly, we build ARE-gene linking scores based on genetics (gLink scores) and demonstrate their unique ability to prioritize SNP-ARE-gene circuits. Overall, our epigenomic datasets, computational integration and mechanistic predictions provide valuable resources and important insights for understanding the molecular basis of human diseases/traits such as schizophrenia.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Study overview and lower-dimensional projection of H3K27ac profiles.
a, Study overview. Left, H3K27ac samples in this study; middle, regulation of AREs and target genes; right, functional implications in human disease/traits genetics. Organ sketch plots were created with BioRender.com. b, Lower-dimensional projection of 387 samples from this study and 240 samples from the reference epigenomes. Different colors represent different tissues or cell types, and different shapes represent different studies; boxed texts label the samples of brain, heart, muscle and lung from reference epigenomes; the top left inset shows the sample sizes of the H3K27ac experiments from various sample sources.
Fig. 2
Fig. 2. ARE activity across tissues and cell types.
a, ARE detection from brain, heart, muscle and lung in this study. Left, schematic diagram showing the AREs detected from each tissue and how they are merged as a reference set, with the number of AREs shown on the right; right, ARE-sharing and specificity across the four tissues, with promoter and enhancer AREs in different colors. b, 282 k AREs identified in this study forming 14 groups of 127 modules based on coactivity across 240 reference epigenomes. Top, 127 ARE modules and module size; middle, proportion of AREs annotated as promoter/enhancer-related chromatin states and different genomic features for each module; bottom, average ARE activity of each module (by column) across 240 reference epigenomes (by row). Representative sample names labeled for each sample cluster on the right; 14 ARE groups labeled at the bottom; red, green, orange and blue boxes showing blood/immune (G3), skeletal/cardiac muscle (G9), brain/neuron (G12) modules and lung-specific modules.
Fig. 3
Fig. 3. Genetic drivers of ARE activity and their tissue specificity.
a, Overlaid Manhattan plot of lead haQTLs from brain, heart, muscle and lung; x axis shows the genomic position and y axis shows the -log10(empirical P value); empirical P value is haQTL P value adjusted for multiple SNPs for each loci as described in Methods; examples 1–5 in b are marked with numbers in colored circles. b, Examples for type I (1–2), type II (3) and type III (4–5), along with the information of ARE group, ARE, lead haQTL, gARE activity and SNP alleles for each sample across tissues and number of samples (n); boxes = 25th to 75th percentile (that is, IQR); line = median; whiskers = 1.5× IQR; haQTLs P values shown in b and c are all nominal P values based on linear regression (two-sided) implemented in FastQTL as described in Methods. c, Directionality consistency-based haQTL tissue-sharing analysis. Lower, between-tissue directionality consistency (y axis on the right) of the lead haQTL effect size (y axis on the left) increases as the P value significance of the haQTLs in the replication tissue (x axis) increases, separated by positive effect (right half-plane) and negative effect (left half-plane) of the matched haQTL–ARE pair in the discovery tissue; the examples 1–3 (shared ARE between tissues) from a are labeled. Upper, density of lead haQTLs in each of the nominal P value bins; the proportion and the corresponding consistency rates of each group are shown in parentheses in the figure legend. d, The estimated proportion of haQTL tissue sharing for each tissue pair. e, The number of types I, II and III gAREs in each tissue. Colors for different types of gAREs are shown in the legend below the panel. f, Enrichment of different types of gAREs in 14 ARE groups (defined in Fig. 2b). x axis, each ARE group; y axis, percentage of different types of AREs in each ARE group; asterisk shows the significance level of enrichment (two-sided Fisher’s exact test, only for odds ratio > 1) compared to all AREs in each tissue (exact P value shown in Supplementary Table 4); colors for different types of gAREs are shown in the legend above the panel. IQR, interquartile range.
Fig. 4
Fig. 4. ARE regulation reveals mechanisms for disease/trait genetics.
a, Enrichment of GWAS signals for diseases and traits in 14 ARE groups, partitioned by the tissue presence for each ARE. Rows represent GWAS studies (full names in Supplementary Table 5) from four categories (right label); columns represent different ARE groups; different triangles in each square denote whether the ARE in a group from a tissue is substantially enriched for the corresponding trait. b, Overview of 54 GWAS–haQTL-colocalized gARE loci for schizophrenia. Top annotations include the ARE group that each gARE belongs to, whether it is tissue-specific colocalization (an example in d), whether the GWAS signal is missed by eQTL from top GWAS–haQTL colocalization tissue (an example in e), and whether there is a different causal SNP from the lead SNP suggested by haQTL (an example in b). c, An example of different GWAS causal SNPs suggested by colocalization. x axis represents genomic location; y axis for each panel from top to bottom, schizophrenia GWAS signal (−log10(P)) and haQTL signals (−log10(P)) for SNPs in brain; P values in Fig. 4 are all nominal P (two-sided) for either GWAS (chi-squared) or QTL (linear regression); the colors of data points represent the correlation (r2) with lead GWAS SNP and symbols denote different types of SNPs as shown in the legend underneath e; the inset indicates the alternative causal SNPs. d, An example of tissue-specific regulation based on colocalization. Panels similar as c; panels for both haQTL and eQTL signals in brain for tissue-specific colocalization are boxed in orange. e, An example of bulk-eQTL-missing GWAS–haQTL-colocalized gARE loci. Bottom, the strongest GWAS–eQTL colocalization detected for RALGAPB; other panels similar as c. f, The counts of bulk-eQTL-missing and -capturing GWAS–haQTL-colocalized gAREs for each tissue.
Fig. 5
Fig. 5. gLink scores as a novel framework to prioritize gARE–eGene linking.
a, gLink approach 1, quantifying gARE–eGene linking based on eQTL proximity to ARE. Top, a schematic view of the approach. Bottom, enrichment of the QTL-proximal regions of gAREs over other regions; x axis, percentage of QTL-proximal regions among all target regions; y axis, fold enrichment of QTL-proximal regions in target regions over those in shuffled genomic regions (the fourth background genomic region in the legend); colors denote types of QTL and symbols denote different target regions as shown in the legend. b, gLink approach 2, quantifying ARE–gene linking based on shared genetic regulation. Top, a schematic view of the approach. Middle, solid dots denote the gAREs linked to MADCAM1 supported by different scores (by row) in different colors; gLink scores are grouped by two approaches with label on the left; the gARE labeled by black dashed vertical line (right one) is supported by scores from both approaches; the ARE labeled by green dashed vertical line (left one) is supported by scores only from approach 2. Bottom, correlation of polygenic gene expression of MADCAM1 (y axis) and activity of the ARE only captured by scores from approach 2 (x axis) across individuals, with r2 and P value (two-sided based on linear regression) labeled, and regression line (solid red) and its 95% confidence interval (dashed blue) shown. c, Comparison of gLink scores and other scores to the ABC score as the benchmark dataset. Top, a schematic view of candidate gAREs and different linking scores considered; Bottom, truncated PRC (x axis: 0–0.2) for gLink scores and EpiMap across four tissues; the inset shows the AUPRC (x axis) for different scores and background; the black dashed line shows the background proportion of the positive links based on the ABC score.
Fig. 6
Fig. 6. gLink scores prioritize gARE–gene circuits for diseases and traits.
a, Enrichment of GWAS–haQTL colocalized gAREs in gAREs with predicted links for brain. X axes, proportion of GWAS–haQTL-colocalized gAREs in gAREs with predicted ARE–gene links based on different linking scores (per mile); different rows in each panel denote different linking scores (gLink scores are shown in red text). Asterisk shows the significance level (one-sided proportion test); exact P value is shown in Supplementary Table 7. b, Comparison of the number of target genes from different linking scores with GWAS–eQTL colocalization. The heatmap shows the proportion of target genes prioritized by each linking score (by row) that are supported by GWAS–eQTL colocalization in brain tissue for each of ten diseases/traits in a (by column); the overlapped gene number is shown in each cell; the total gene number for each row is shown on the right. c, Schizophrenia GWAS–haQTL-colocalized gARE–gene circuits in the brain. The heatmap shows the genetic evidence of association between target gene and schizophrenia for each gARE–gene circuit (by row) in each tissue (by column); genomic position of ARE and ARE group is shown on the left; for each cell, the upper triangle shows the evidence supported by GWAS–eQTL colocalization (PP4), and the lower triangle shows the number of gLink scores that connect GWAS–haQTL-colocalized gARE to the same gene. d, Schizophrenia GWAS and brain QTL signals at CORO7NMRAL1 locus prioritized in c. From top to bottom, gene and gARE-genes linking annotation, schizophrenia GWAS signals, haQTL signals for gARE (chr16:4409428–4410046) and eQTL signals for CORO7 and NMRAL1 in brain with colocalization PP4 shown in parenthesis; P values are all nominal P (two-sided) for either GWAS (Chi-squared) or QTL (linear regression). AD, Alzheimer’s disease; NEUROT_UKB, neuroticism; ADHD, attention-deficit/hyperactivity disorder; SLEEP, sleep duration; CHRONO, chronotype; FIS, fluid intelligence score, EDU, education years; UKB, GWAS from UK biobank (detailed information in Supplementary Table 5);.
Extended Data Fig. 1
Extended Data Fig. 1. Correlation of H3K27ac profiles between samples in this study and those from the reference epigenomes.
Each column represents a sample in our study with tissue name on the top, and each row represents a sample from the reference epigenomes; for each sample in our study, the top five highly correlated reference samples are labeled with ‘*’; orange, red, green and blue boxes indicate tissue-matched pairs between our data and the reference data.
Extended Data Fig. 2
Extended Data Fig. 2. Tissue specificity of AREs and functional annotations of ARE modules.
a, ARE tissue-specificity and sharing across brain, heart, muscle, and lung. The Venn diagram shows the numbers and proportions of AREs for different combinations of tissue-sharing across four tissues. b, 282k AREs identified in this study form 1413 submodules from 127 modules based on coactivity across 240 reference epigenomes. Upper panel: ARE activity of 1413 submodules (by column) across samples (by row) in our study; orange, red, green, and blue boxes showing tissue-specific modules for brain, heart, muscle, and lung, respectively; sex and tissue information are on the right. Lower panel: ARE activity of 1413 submodules in the reference epigenomes; sample clusters annotated on the right. c, GO biological processes enrichment for 127 ARE modules. Each row represents a GO term and each column represents an ARE module with ARE group labeled at the bottom; red, green, orange, and blue boxes indicate the enrichment for G3, G9, G12, and lung-specific modules.
Extended Data Fig. 3
Extended Data Fig. 3. TF motif enrichment of ARE modules and ARE detection power.
a, Motif enrichment for enhancer modules. Each row denotes a TF family, represented by the TF labeled on the right having the strongest odds ratio across modules; each column represents an ARE module with ARE group labeled at the bottom; red, green, orange, blue, and purple boxes indicate enrichment for G3, G9, G12, and lung-specific modules. b, Comparison of ARE detection rates between Newly-detected ARE (G14) and the other groups. X-axis shows the number of brain samples randomly selected for each experiment; y-axis shows the proportion of AREs detected from each experiment; colors denote which groups AREs are from; n = 10 independent times of sampling for each box.
Extended Data Fig. 4
Extended Data Fig. 4. Tissue-archetype fraction estimation.
a, Deconvolution step 1. The heatmap shows the correlation between the profiles of tissue-archetype (by column) and the profiles for the reference samples (by row) with strong tissue-archetype specific patterns. Typical sample names are shown on the right, four samples that are not clustered with other tissue-matched samples in Fig. 1b and mentioned in the section of ‘Comparison of H3K27ac profiles across studies’ are labeled on the left. b, Deconvolution step 2. The heatmap shows the fraction of each tissue-archetype (by column) estimated for samples (by row) in each of our tissues, with the primary tissue-archetypes indicated by gray boxes.
Extended Data Fig. 5
Extended Data Fig. 5. Sex-biased ARE identification.
a, Comparison between principal components (PCs) and covariates including estimated tissue-archetype fractions and known factors for brain samples. Top left panel: heatmap shows the correlation between PCs (by column) and known factors (by row); top right panel: percentage of variation (x-axis) explained by the covariates (by row), with red highlight for the primary tissue-archetype identified in Extended Data Fig. 4a; bottom panel: the percentage of variation (y-axis) explained by each PC for brain samples (by column). b, Sex-biased AREs, activity pattern and annotations. Left panel: enrichment of sex-biased genes from matched GTEx tissue (by column) in genes closest to sex-biased AREs identified from this study; * denotes strong enrichment (adjusted P < 0.1, two-sided Fisher’s exact test, BH correction across multiple tissues tested, shown in Supplementary Table 2); middle panel: sex-biased ARE activity of each sample (by column) in each tissue, with top 5 sex-biased genes closest to any sex-biased ARE labeled on the right; right panel: GO biological processes enriched for genes near sex-biased AREs; purple and blue colors represent female-biased and male-biased genes and terms, respectively, for middle and right panels. c, Coordinated regulation of ARE activity and gene expression by sex. Left panel: ARE activity for the sex-biased AREs in heart samples; right panel: sex-differential signal for the genes closest to the sex-biased AREs in heart; boxes = 25th-75th percentile (that is inter-quartile range; IQR); line = median; whiskers = 1.5 × IQR.
Extended Data Fig. 6
Extended Data Fig. 6. Identification of haQTLs.
a, Comparison between Peer factors (by column) and covariates (by row) including estimated tissue-archetype fraction and known factors for brain. b, Power analysis for haQTL mapping. Colors indicate different minor allele frequencies, and vertical dashed lines denote current sample sizes for each tissue. c, Saturation analysis for haQTL detection in the brain based on down-sampling. For each sample size, 10 randomly down-sampling were performed. The x-axis denotes sample sizes after downsampling, while the y-axis denotes the detection rate of the haQTLs from the downsampled data relative to the haQTLs detected from the full data. The boxes show the 25th–75th percentile; the lines show the median; the whiskers show 1.5 × IQR. d, Distribution of the genomic distance between a gARE and its lead haQTL for brain.
Extended Data Fig. 7
Extended Data Fig. 7. haQTL tissue-specificity.
a, Quantification of haQTL pairwise tissue-sharing based on directionality consistency. The x-axes show the -log10(P-value) of haQTLs in the replication tissue, separated by positive effect (right half-plane) and negative effect (left half-plane) in the discovery tissue; the y-axes show the haQTL effect sizes in the replication tissue; haQTLs P-values shown in panels a-d are all nominal P-values based on linear regression (two-sided test). b, Quantification of haQTL pairwise tissue-sharing based on similarity of effect size. The x-axes show the haQTL effect sizes in the discovery tissue, and the y-axes show the haQTL effect sizes in the replication tissue. c, The effect size similarity, defined as the coefficient of effect size between the replication tissue and discovery tissue, increases as the P-value significance increases in the replication tissue; the centers represent the estimated coefficient, and the error bars denote standard errors of the estimation. d, Identification of Type-I gAREs (haQTL-Shared) based on the nominal P-values in the replication tissue. The black curve shows directionality consistency (the y-axis on the left) of gAREs passing the P-value threshold on the x-axis; the green curve shows the count of gAREs (the y-axis on the right) passing the nominal P-value threshold on the x-axis; the nominal P-value threshold was set to 0.02 in the replication tissue to define a Type-I gARE (haQTL-Shared) between the discovery and replication tissues, which makes the directionality consistency between the two tissues be over 95%. e, gARE type explains eQTL tissue specificity. Different panels represent results for each tissue; x-axis represents the different type of gARE with increasing tissue specificity; y-axis represents eQTL tissue-specificity, the number of eQTL-sharing tissues; P-values testing the dependence of eQTL tissue-specificity on gARE tissue-specificity (linear regression, two-sided) are shown on top.
Extended Data Fig. 8
Extended Data Fig. 8. GWAS-haQTL and -eQTL colocalization.
a, GWAS-haQTL vs. GWAS-eQTL colocalization over 1694 brain gARE loci with significant brain eQTL. The x-axis shows different GWAS traits, and the y-axis denotes the counts of GWAS-haQTL or GWAS-eQTL colocalization events from three types: shared colocalization (in red, both GWAS-eQTL and GWAS-haQTL coloc PP4 ≥ 0.1, at least one of them ≥0.5), haQTL-specific colocalization (in green, GWAS-haQTL coloc PP4 ≥ 0.5 and GWAS-eQTL coloc PP4 < 0.1), and eQTL-specific colocalization (in blue, GWAS-eQTL coloc PP4 ≥ 0.5 and GWAS-haQTL coloc PP4 < 0.1). b, A comparison between schizophrenia GWAS-eQTL colocalization analyses with/without SuSiE. The heatmap shows the coloc PP4 for each gARE locus (by row, labeled on the right, hg38 coordinates) with each method (by column, labeled on top). * marks blocks missing colocalization signal (coloc PP4 < 0.1), while arrow points out the only loci missed by coloc and captured by coloc with SuSiE. c, A comparison between schizophrenia GWAS-eQTL colocalization analyses using brain bulk and cell-type-level eQTL. The heatmap shows the coloc PP4 for each gARE locus (by row, labeled on the right, hg38 coordinates) based on bulk haQTL, bulk eQTL, and eQTL from eight cell types in the brain (by column). * marks blocks with weak colocalization signal (coloc PP4 ≥ 0.1), while ** marks blocks with strong colocalization signal (coloc PP4 ≥ 0.5). Orange dot on the right marks the loci only captured by cell-type-level eQTL (coloc PP4 ≥ 0.1), while green dot marks the loci missed by both types of eQTLs with stringent cutoff (coloc PP4 < 0.5). Three gARE loci with strong GWAS-haQTL colocalization signals (coloc PP4 ≥ 0.5) and missed by both types of eQTLs even at the permissive cutoff (coloc PP4 < 0.1) are shown in red box.
Extended Data Fig. 9
Extended Data Fig. 9. Properties of gLink scores.
a, eQTL/FM-eQTL proximal enrichment of genomic regions at different bins of SNP-region distance. The plots show the fold enrichment of QTL-proximal regions in target regions (in different color) over shuffled genomic regions for eQTL (left) and FM-eQTL (right) for each tissue. The y-axis shows the enrichment, while the x-axis shows different SNP-region bins (0–250 bp, 250bp-500bp, 0.5–1, 1–1.5, 1.5–2, 2–2.5, 2.5–3, 3–3.5, 3.5–4, 4–4.5, 4.5–5 kb). The blue dashed line marks 2 kb which we chose as the cutoff to define the proximity. b, Proximal gAREs enrich FMeQTLs interrupting TF binding sites. The y-axis shows the proportion of FMeQTLs that interrupt TF binding sites from Extended Data Fig. 2c; orange bars represent the FMeQTLs located in gAREs, and gray bars represent FMeQTLs in all AREs. c, FMeQTL-proximal gAREs do not guarantee shared genetic regulation between gene and ARE. Each point: gARE-gene pair based on gARE-dist-to-FMeQTL score (distance cutoff of 2 kb); x-axes denote the shared genetic regulation at the SNP-level (nominal P for haQTL, linear regression, two-sided); y-axes denote the shared genetic regulation at the locus-level; Red circles indicate pairs without evidence of shared genetic regulation at both the locus and SNP levels. Percentage of FmeQTL-proximal gAREs without shared genetic regulation in each tissue shown at the top of each graph. d, Performances of gLink scores with one of them as the benchmark dataset (AUPRC). The heatmaps show AUPRC for each gLink score (by row) with one of scores as the benchmark dataset (by column) for each tissue; red dashed boxes indicate results for gLink scores from approach 2, showing higher consistency between these scores. e, Performance of gLink scores with EpiMap score as the benchmark dataset. We showed PRC of gLink scores for each tissue, which is barely higher than that of background.
Extended Data Fig. 10
Extended Data Fig. 10. gLink scores prioritize gARE-gene circuits for diseases and traits.
a, Enrichment of GWAS–haQTL-colocalized gAREs in the gAREs with predicted links for heart- and lung-related traits. Figure format as in Fig. 6a; * shows the significance levels (one-sided proportion test); HPT, hypertension; CH2, MAGNETIC_CH2.DB.ratio; HDLC, MAGNETIC_HDL.C; IDL, MAGNETIC_IDL.TG; ATH, asthma; FEV1, volume that has been exhaled at the end of the first second of forced expiration; FVC, Forced Vital Capacity; PEF, Peak expiratory flow; UKB, UK biobank; UKBS, self reported traits from UK biobank. b, Target genes inferred compared to the predictions from Weeks et al. The x-axes denote the percentage of genes inferred from each approach (by row) overlapping with the disease genes from a previous report for each disease (by panel); * shows the significance levels (two-sided Fisher’s exact test). INSOMN, insomnia. c, Comparison between target genes from different linking scores. The heatmap shows the mean of the similarity of target genes (Jaccard index) across 10 brain-related traits in Fig. 6a for each pair of linking scores. d, Distribution of the distance between a GWAS–haQTL-colocalized gARE and its predicted target gene by each linking score for the 10 brain-related traits in Fig. 6a. Boxes = 25th–75th percentile (that is inter-quartile range; IQR); line = median; whiskers = 1.5 IQR; number of gARE–gene pairs shown. e, Schizophrenia GWAS–haQTL-colocalized gARE-gene circuits only in muscle or heart. Left panel: The heatmap shows the genetic evidence of association between target gene and schizophrenia for each gARE–gene circuit (by row) in each tissue (by column); genomic position of ARE and ARE group shown on the left; for each cell, upper triangle shows evidence based on GWAS–eQTL colocalization (PP4), and lower triangle shows number of gLink scores that connect GWAS–haQTL-colocalized gARE to the same gene; genes in red text on right side of heatmap identified as fibroblast subtype marker genes from a brain vasculature sc-RNA-seq study. Right panel: upper, UMAP result of sc-RNA-seq profiles with cell subtype labeled from a brain vasculature study, and lower, WBP1L expression level marked in the UMAP. f, Schizophrenia GWAS–eQTL colocalization for gAREs loci from panel e. The heatmap shows GWAS–eQTL PP4 across 13 brain-related tissues and 7 muscle/heart-related tissues (by column) for each gARE (by row); ARE group and GWAS–haQTL colocalization are annotated on left.

References

    1. Buniello A, et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. - PMC - PubMed
    1. Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 2012;30:1095–1106. - PMC - PubMed
    1. Maurano MT, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. - PMC - PubMed
    1. Tak YG, Farnham PJ. Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenetics Chromatin. 2015;8:1–18. - PMC - PubMed
    1. Claussnitzer M, et al. A brief history of human disease genetics. Nature. 2020;577:179–189. - PMC - PubMed

Publication types