Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct;54(10):1493-1503.
doi: 10.1038/s41588-022-01170-4. Epub 2022 Sep 26.

Population-level variation in enhancer expression identifies disease mechanisms in the human brain

Collaborators, Affiliations

Population-level variation in enhancer expression identifies disease mechanisms in the human brain

Pengfei Dong et al. Nat Genet. 2022 Oct.

Abstract

Identification of risk variants for neuropsychiatric diseases within enhancers underscores the importance of understanding population-level variation in enhancer function in the human brain. Besides regulating tissue-specific and cell-type-specific transcription of target genes, enhancers themselves can be transcribed. By jointly analyzing large-scale cell-type-specific transcriptome and regulome data, we cataloged 30,795 neuronal and 23,265 non-neuronal candidate transcribed enhancers. Examination of the transcriptome in 1,382 brain samples identified robust expression of transcribed enhancers. We explored gene-enhancer coordination and found that enhancer-linked genes are strongly implicated in neuropsychiatric disease. We identified expression quantitative trait loci (eQTLs) for both genes and enhancers and found that enhancer eQTLs mediate a substantial fraction of neuropsychiatric trait heritability. Inclusion of enhancer eQTLs in transcriptome-wide association studies enhanced functional interpretation of disease loci. Overall, our study characterizes the gene-enhancer regulome and genetic mechanisms in the human cortex in both healthy and diseased states.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. TEns identification
a, The distribution of deconvolved RNA-seq cell type distribution for each sample (Nneuron = 47, Nnon-neuron = 46). Box plot indicates median, interquartile range (IQR) and 1.5 × IQR. b, Jaccard index between the Multi-omics peaks and previous reports. cell-type-specific ATAC-seq/H3K4me3/H3K27ac peaks were compared to our previous reports of the corresponding assays,. H3K27me3 was compared to Roadmap H3K27me3 peaks. PFC indicates the prefrontal cortex. c, Empirical density distribution of FANTOM5 enhancer size; based on the curve, we chose 500bp as the TEn size. d, Feature importance heatmap from random forest models. e, Distribution of TEn numbers per super-enhancer in the two cell types (Nneuron = 2,049, Nnon-neuron = 1,946). Box plots as in a. f, Distance to the nearest TEns for every TEn within or outside of super-enhancer regions (Nneuron = 29,555, Nnon-neuron = 23,260, P < 10−16 for both cell types, two-sided Wilcoxon test). Box plots as in a. g, Epigenomic profiles of expressed TEns, and promoters of protein-coding genes and lincRNAs. The line represents the mean, and the shadow indicates a 95% confidence interval.
Extended Data Fig. 2
Extended Data Fig. 2. TEns identification methods compare
a, Overlap between our model(RF) and logistic regression (logit), ChromHMM +OCR (ChromHMM), and the OCR only (OCR) method in neuronal and non-neural cells (Methods). b, Strand-specific CAGE tags average profile for the shared and methods-specific enhancers. In all three comparisons, our model-specific enhancers are enriched for CAGE tags, while the logit-, ChromHMM-, and OCR-specific are depleted of CAGE tags. The radar plots show typical enhancer-related signals including H3K4me3, H3K27ac, H3K27me3, super-enhancer, and loop anchor occupancy of OCR only specific, our model-specific, and shared enhancers of the two methods in neuronal c, and non-neuronal d, cells. The value within parentheses indicates the odds ratio (OR) and P value between OCR only specific and the rest (one-sided Fisher exact test).
Extended Data Fig. 3
Extended Data Fig. 3. Differential expression/activity between neuronal and non-neuronal cells
a, ChromHMM identified 6 chromatin states including EnhA (active enhancer), TssA (active promoters), TssBiv (bivalent promoters), TssFlnk (promoter flanking region), ReprPC (polycomb repression region), and Quies (other regions). b, Comparing the cell-type-specific Chromatin states with the roadmap DLPFC result. c, Jaccard index between neuronal and non-neuronal chromatin states. Compared to active promoters (TssA), and polycomb repressed regions (ReprPC), active enhancers (EnhA) are remarkably more different between cell types. d, Overlap of expressed TEns between neuronal and non-neuronal cells. e, π1 statistics of differential activity between the two cell types of different molecular markers. f, Violin plot shows the variance explained by different factors for the five markers (NATACSeq = 98, NTEns = Ngene = 93, NH3K27ac = 96, NH3K4me3 = 96). Box plot indicates median, interquartile range (IQR), and 1.5 × IQR.
Extended Data Fig. 4
Extended Data Fig. 4. GWAS association with different classes of enhancers
a. Stratified LD score regression of different classes of enhancers across different classes of traits. A positive coefficient signifies enrichment in heritability (per base enrichment, mean ± standard error). ”·”: Nominally significant (P < 0.05); “+”: significant after FDR (Benjamini & Hochberg) correction (FDR < 0.05). b, coverage (Mb) and numbers (k) of different classes of enhancers. c, Stratified LD score regression of cell-type-specific enhancers defined by TEns, the intersection of differential H3K27ac and noncoding OCR (Diff H3K27ac ∩ OCR), and the intersection of differential noncoding OCR and H3K27ac (Diff OCR ∩ H3K27ac) across different classes of traits as in a. d, coverage (Mb), and numbers (k) of the three types of enhancers.
Extended Data Fig. 5
Extended Data Fig. 5. enhancer-gene link
a, the expression level of different classes of genes and enhancers (Nintergeric_enhancer = 4,626, Nintronic_enhancer = 37,358, NlincRNA = 2,915, Nprotein_coding_gene = 16,468). Box plot indicates median, interquartile range (IQR), and 1.5 × IQR. b, Pairwise Spearman correlation coefficient (SCC) between samples of different classes of genes/enhancers of discovery vs discovery (DD), discovery vs replicate (DR), and replicate vs replicate (RR) (Nintergeric_enhancer = 4,080, Nintronic_enhancer = 36,493, NlincRNA = 2,665, Nprotein_coding_gene = 16,175). Box plots as in a. c, Distribution of gene FPKM (fragments per kilobase of transcript per million mapped reads) and counts for linked enhancers. Linked genes have a significantly higher gene expression (FPKM, two-sided Wilcoxon test, P < 10−16). Box plots as in a. d, Distribution of enhancer CPM and for counts linked genes. Linked enhancers do not have a higher expression (CPM, two-sided Wilcoxon test, P = 0.96). Box plots as in a. e, Distance between enhancers to genes of linked group and background (Nlink = 35,964, Nbackground = 241,040). f, Distance between enhancers to genes of linked group and background, physically-overlapped gene-enhancer pairs were excluded (Nlink = 22,920, Nbackground = 220,062). Box plots as in a. g, Neuronal vs non-neuronal t statistics for linked and non-linked enhancers. Linked enhancer has a significantly higher value (two-sided KS test, P < 10−16).
Extended Data Fig. 6
Extended Data Fig. 6. enhancer-gene link
a, Scatter plot of pairwise Spearman correlation (SCC) between linked enhancers and genes for the Discovery set (x-axis) and Replicate set (y-axis), the two sets of correlations are highly consistent (SCC = 0.79, P < 10−16, two-sided Spearman correlation test). b, Neuropsychiatric diseases and behavioral traits, neurological diseases, and non-brain related trait enrichment with different classes of genes (determined by MAGMA). For BD and SCZ, we included different GWAS versions including PGC2 SCZ (Ncase = 36,989, Ncontrol = 113,075), PGC3 SCZ (Ncase = 69,369, Ncontrol = 236,642), PGC2 BD (Ncase = 29,764, Ncontrol = 169,118), PGC3 BD (Ncase = 41,917, Ncontrol = 371,549). c, Pairwise comparison of the coefficients of significant traits for enhancer-linked neuronal genes (one-sided t-test). The color of the heatmap exhibits the one-sided t-test significance (row vs column). ”·”: Nominally significant (P < 0.05); “+”: significant after FDR (Benjamini & Hochberg) correction (FDR < 0.05).
Extended Data Fig. 7
Extended Data Fig. 7. gene and enhancer eQTL
a, Percentage of genes/enhancers that are cis-heritable (GCTA nominal P < 0.05 and cis-heritability > 0). b, Overlap of cis-heritable genes between ACC and DLPFC. c, Overlap of cis-heritable enhancers between ACC and DLPFC. d, Dot plots illustrate the first two genetic ancestry principal components (PCs) for individuals reported (report) to have European ancestry and the individuals selected (select) to have European ancestry based on sd to the center. e, The replication of eQTL in the discovery set in the replicate set across different P value cutoffs for the discovery set. π1 values (proportion of true positive p values) for the discovery set significant eQTL in our replicate eQTLs. SCC values are the Spearman correlation coefficients of significant eQTL effect sizes between the two sets. f, The percentage of genes/enhancers that have significant eQTL are cis-heritable across different P value cutoffs. Overlap of the cis heritable transcript with the transcript with eQTL for genes g, and h, enhancer.
Extended Data Fig. 8
Extended Data Fig. 8. consistent genetic effects between linked genes and enhancers
a, The distribution of genomic distances from eSNPs to the TSSs for different classes of transcripts. b, The replication of reported eQTL in our analysis. π1 values (proportion of true positive P values) for reported significant GTEx eQTL in our gene eQTLs. SCC values are the Spearman correlation coefficients of significant eQTL effect sizes between GTEx eQTL and corresponding pairs in our data. The bar color represents the number of unique genes used. c, for both physically overlapping and non-overlapping pairs, we subsampled from the non-linked pairs to generate backgrounds that have similar levels of enhancer-TSS distance distributions compared to linked pairs (numbers in the parenthesis indicate the number of pairs). Box plot indicates median, interquartile range (IQR) and 1.5 × IQR. d, The allelic genetic effect between gene and target enhancers considering the gene-enhancer physically-overlapping and enhancer-TSS effects. The background is controlled for enhancer-TSS distance.
Extended Data Fig. 9
Extended Data Fig. 9. quantile-quantile plot of GWAS P values
Quantile–quantile plot of GWAS p values across different GWAS traits. EeQTL specific, eQTL specific, and shared SNPs are shown in comparison with genome-wide SNPs. GWAS SNPs were binarily annotated using SNPs within r2 > 0.8 of the eSNP.
Extended Data Fig. 10
Extended Data Fig. 10. GWAS association
a, Stratified LD score regression of neuronal and non-neuronal enhancer/gene eQTL across different classes of traits. The gene/enhancer eQTLs are assigned to Neuronal and non-Neuronal groups based on the differential expression status between the two cell types. A positive coefficient signifies enrichment in heritability (per base enrichment, mean ± standard error). ”·”: Nominally significant (P < 0.05); “+”: significant after FDR (Benjamini & Hochberg) correction (FDR < 0.05). b, Gene TWAS Z scores compared to published reports,. SCC represents the Spearman correlation coefficient (ρ) (all P < 10−16, two-sided Spearman correlation test). c, For different types of transcripts, the TWAS Z scores between the two brain regions. SCC represents the Spearman correlation coefficient (ρ)(all P < 10−16, two-sided Spearman correlation test). d, Aligned Manhattan plots of SCZ GWAS and EeQTLs at the enh41216 locus generated by LocusCompare. SNPs are colored by LD (r2) with the lead EeQTL (rs3247233).
Fig. 1 |
Fig. 1 |. Catalog cell-type-specific TEns in the human cortex.
a, Dissections from five brain regions (Brodmann area 10, 17, 22, 36, and 44) of 10 control subjects were obtained from frozen human postmortem tissue. b, Combined with FANS, we performed functional assays including total RNA-seq, ATAC-seq, H3K4me3/H3K27ac/H3K27me3 ChIP-seq, and Hi-C for neuronal (NeuN+) and non-neuronal (NeuN−) nuclei. Created with BioRender.com. c, Transcriptomic and Epigenomic profiles around transcribed enhancers, including FANTOM enhancers (FEs) and OCRs with bi-directional CAGE tags (Extra transcribed, ETs), and non-transcribed elements (NTs). The line represents the mean, and the shadow shows the 95% confidence intervals. d, Flow charts demonstrate TEns identification pipelines. Briefly, transcriptomic and epigenomic signals around the enhancer regions were aggregated for both neuronal and non-neuronal cells. Transcribed enhancers (FEs and ETs) were used as positive sets, and non-transcribed elements (NTs) were used as negative sets. The parameter was tuned with 10-fold cross-validation to select a random forest model to classify transcribed and non-transcribed elements. Lastly, the enhancer expression matrix and gene expression matrix were combined for downstream analysis. e, ROC (receiver operating characteristic curve) and f, PRC (precision-recall curve) of the resulting random forest models exhibit high AUROC (area under the ROC) and AUPRC (area under the PRC). g, The radar plots show typical enhancer-related signals including H3K4me3, H3K27ac, H3K27me3, super-enhancer, and loop anchor occupancy between identified TEns and background (NTs). The value within parentheses indicates the odds ratio between identified TEns and background. h, Strand-specific CAGE tag average profile (top) and distribution (bottom) for intergenic (interg) and intronic TEns.
Fig. 2 |
Fig. 2 |. TEns expression captures cell-type-specific enhancer function.
a, cell-type-specific effect size (log2 fold change) between TEns and overlapping H3K27ac peaks are highly consistent for both intergenic (N = 7,624) and intronic (N = 42,148) TEns. SCC represents the Spearman correlation coefficient (ρ) (P < 10−16 for both, two-sided Spearman correlation test). b, LD score regression enrichment for DE TEns/peaks of different classes of traits. A positive coefficient signifies enrichment in heritability (per base enrichment, mean ± standard error). ”·”: Nominally significant (P < 0.05); “+”: significant after FDR (Benjamini & Hochberg) correction (FDR < 0.05). c, DE TEns/peaks coverage (Mb) and numbers (k).
Fig. 3 |
Fig. 3 |. Gene-enhancer expression coordination.
a, Two independent Cohorts were used for this study, both cohorts consisting of total RNA-seq from DLPFC and ACC postmortem brains, as well as matched genotyping data. The discovery set includes 1,014 RNA-seq samples from NSCZ = 254 and Ncontrol = 291 unique individuals. The replicate set includes 368 RNA-seq samples from NSCZ = 78 and Ncontrol = 151 unique individuals. Brain regions were visualized with cerebroViz. b, The Venn diagram shows the overlap of expressed enhancers between the discovery and the replicate analysis. c, Demonstration of the lasso link model. For every gene, we considered all the enhancers within ±500kb of the TSS and fit a lasso model to select the linked enhancers. d, Distribution of the three different classes of gene-enhancer pairs: non-overlapping intergenic-enhancer gene pairs, overlapping intronic-enhancer gene pairs, and non-overlapping intronic-enhancer gene pairs. e, The differential expression status in the two cell types between gene and linked enhancers. f, Percentage of gene promoter and enhancer within the same TAD for different genomic distances at 10kb intervals. g, Compared to the background, the enhancer-linked genes have markedly higher absolute t statistics between neuronal and non-neuronal cells (two-sided KS test, P < 10−16). h, Cell types, neuropsychiatric common variants, and biological pathways enriched at different classes of linked and non-linked genes. Cell type and biological pathways enrichment were determined by One-tailed Fisher exact tests, and common variants association was determined by MAGMA (Methods). DE, differentially expressed. Opc, oligodendrocytes progenitor cells
Fig. 4 |
Fig. 4 |. Genetic effects of enhancer expression.
a, The percentage (y-axis) and number (label) of different classes of autosomal gene/enhancers that have significant eQTLs. b, Percentage of EeSNP resides within any GeSNP LD blocks (r2 > 0.5). c, Percentage of GeSNP resides within any EeSNP LD blocks (r2 > 0.5). d, The replication of reported hQTL and caQTL in our analysis. Storey’s π1 values for significant hQTL and caQTL in the EeQTLs. SCC values are the Spearman correlations (ρ) of effect sizes between caQTL and corresponding EeQTL. The size of the point corresponds to the number of unique enhancers used. The number above the bar corresponds to the count of unique enhancers used. e, The allelic genetic effects between gene and target enhancers are highly consistent. The background is controlled for TSS-enhancer distance (Extended Data Fig. 8c) f, Enrichment of neuronal and non-neuronal genes, and enhancers corresponding eQTLs in four chromatin states: TssA (active promoter), EnhA (active enhancer), ReprPC (polycomb repression), and Quies (other). The filled color represents enrichment fold change, and the size corresponds to enrichment P values (determined by GREGOR). g, Top panel, LD score regression enrichment for gene and enhancer eSNPs of different traits. Positive coefficient signifies enrichment in heritability (per base enrichment, mean ± standard error).”·”: Nominally significant (P < 0.05); “+”: significant after FDR (Benjamini & Hochberg) correction (FDR < 0.05). Bottom panel, (Estimated proportion ± standard error) of heritability mediated by the cis genetic component of assayed enhancer, gene, and combined expression for different traits.
Fig. 5 |
Fig. 5 |. SCZ TWAS.
a, Manhattan plot of SCZ TWAS enrichment in DLPFC, ACC, and the independent genome-wide significant SCZ associations (excluding chrX and MHC). GWAS node height corresponds to the index SNP significance, and the color indicates if the GWAS loci are associated with enhancer, gene, both gene, and enhancer, or none of them. Significant TWAS enhancers (purple) and genes (green) are highlighted in different colors. b, Ternary plots showing coloc posterior probabilities for significant TWAS genes and enhancers respectively. PP0+PP1+PP2: three scenarios for lack of test power; PP3: independent causal variants; PP4: colocalized causal variants. c, Venn diagrams show the overlap between the fine mapped TWAS genes, TWAS-enhancers linked genes, PsychENCODE (PEC) TWAS genes, PsychENCODE (PEC) high confidence risk genes, and SCZ high-confidence risk genes (HRG). d, Illustration of genomic loci at chr15, harboring multiple TWAS loci. Top panel, transcript position (only cis-heritable transcripts are shown), the color indicates transcription direction of genes. Middle, gene/enhancer TWAS Z score absolute value, point size indicates the FOCUS posterior inclusion probability (PIP), color indicates the genetic correlation with the highest PIP. Bottom, Manhattan plot of SCZ GWAS signal, before and after conditioning on enh41216. e, KRAB-dCas9-mediated repression of enh41216 leads to the reduction of ABHD2 and KIF7 expression as measured by qPCR in neural progenitor cells. (two-sided t-test, N = 6 independent experiments performed on 2 individuals, mean ± standard error are shown)

References

    1. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014). - PMC - PubMed
    1. Visscher PM et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet 101, 5–22 (2017). - PMC - PubMed
    1. Sullivan PF & Geschwind DH Defining the genetic, genomic, cellular, and diagnostic architectures of psychiatric disorders. Cell 177, 162–183 (2019). - PMC - PubMed
    1. Grove J et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet 51, 431–444 (2019). - PMC - PubMed
    1. Mullins N et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat. Genet 53, 817–829 (2021). - PMC - PubMed

Publication types