Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;54(9):1364-1375.
doi: 10.1038/s41588-022-01168-y. Epub 2022 Sep 7.

Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation

Affiliations

Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation

Sylvan C Baca et al. Nat Genet. 2022 Sep.

Abstract

Many genetic variants affect disease risk by altering context-dependent gene regulation. Such variants are difficult to study mechanistically using current methods that link genetic variation to steady-state gene expression levels, such as expression quantitative trait loci (eQTLs). To address this challenge, we developed the cistrome-wide association study (CWAS), a framework for identifying genotypic and allele-specific effects on chromatin that are also associated with disease. In prostate cancer, CWAS identified regulatory elements and androgen receptor-binding sites that explained the association at 52 of 98 known prostate cancer risk loci and discovered 17 additional risk loci. CWAS implicated key developmental transcription factors in prostate cancer risk that are overlooked by eQTL-based approaches due to context-dependent gene regulation. We experimentally validated associations and demonstrated the extensibility of CWAS to additional epigenomic datasets and phenotypes, including response to prostate cancer treatment. CWAS is a powerful and biologically interpretable paradigm for studying variants that influence traits by affecting transcriptional regulation.

PubMed Disclaimer

Conflict of interest statement

Competing interest

The authors declare no competing interests.

Figures

Extended Data Figure 1.
Extended Data Figure 1.. Accurate genotyping of SNPs from epigenomic data.
(A) Overview of 575 epigenomic datasets merged across 163 individuals for genotyping. Datasets are colored by cohort (See Table S1). (B) Genomic distribution of reads in ChIP-seq, RNA-seq and input control (whole genome) data. The genome was divided into non-overlapping 500 base-pair windows and cumulative read counts for each bin were summed. For each datatype, five samples were randomly selected and down-sampled to 8.4 million reads for uniformity. The mean percentage of bins with the indicated number of read counts is shown for each datatype. (C) Number of covered SNPs (≥ 5 reads) versus total aggregated reads for each individual. (D) Number of covered SNPs (≥ 5 reads) for each individual (n=165) as the indicated number of datasets are merged. Datasets were added in random order for a given individual. For boxplots, lower and upper hinges indicate 25th and 75th percentiles; whiskers extend to 1.5 x the inter-quartile ranges (IQR). (E) Correlation of imputed versus array-based genotype dosages across 24 individuals. (F) Receiver operating characteristic curve for detection of heterozygous SNPs using sequencing and imputation, with array-based genotypes as ground truth. Dotted red line indicates a mean sensitivity of 0.92 at a specificity of 0.9 in individuals of European ancestry.
Extended Data Figure 2.
Extended Data Figure 2.. Inferred ancestry of individuals in the study.
Projection of imputed genotypes onto the first two principal components of continental ancestry from ref.. Individual identifiers for outlier samples (with values > 2 x standard deviation) are labeled. Self-reported ancestry is coded by color.
Extended Data Figure 3.
Extended Data Figure 3.. Overlap of cQTLs with prostate tissue eQTLs.
(A) Enrichment of genetically determined AR peaks (left) and H3K27ac peaks (right) for overlap with GWAS risk SNPs eQTLs across various tissues. Empiric p values are derived 10,000 from permutations. (B) number of AR and H3K27ac cQTLs that are also the top eQTL for a gene in prostate tissue. (C) correlation of cQTL and eQTL effect size (β) for cQTL SNPs; p-value for Pearson correlation test is indicated. (D) Examples of SNPs (labeled with rs identifier) that are both AR cQTLs and eQTLs where the corresponding cPeak and eGene are connected by an H3K27ac HiChIP loop in LNCaP. cPeak coordinates are shown and eGene transcriptional start sites (TSS) is denoted. (E) Contingency table showing enrichment of H3K27ac HiChIP looping between the corresponding cPeak and eGene for cQTLs that are also eQTLs. Chi-square test p-values are indicated.
Extended Data Figure 4.
Extended Data Figure 4.. Distribution of cQTLs around cPeaks.
cQTL SNP significance versus distance to the center of the corresponding cPeak for significant cQTLs (permutation-based q-value < 0.05). Dashed blue lines indicate ± 25Kb from the peak center.
Extended Data Figure 5.
Extended Data Figure 5.. Conditioning of GWAS SNP significance on genetically predicted CWAS AR binding.
Genomic context of AR CWAS ARBS (depicted in green) that are significantly associated with prostate cancer risk. Manhattan plots indicate significance of SNP associations with prostate cancer before and after conditioning on genetically predicted CWAS ARBS activity. (A) and (B) show representative examples where ARBS explain most of the nearby cis-SNP GWAS significance. (C) CWAS ARBS at the promoter of GGCX, where residual GWAS significance remains after conditioning on ARBS, suggesting additional mechanisms underlying risk conferred by SNPs in this region.
Extended Data Figure 6.
Extended Data Figure 6.. Comparison of CWAS and GWAS significance for tested ARBS and H3K27ac peaks.
The absolute value of the association Z-score is plotted for CWAS peak-trait associations (y-axis) and GWAS SNP-trait associations for the most significant nearby SNP (x-axis). (A) shows ARBS and (B) shows H3K27ac peaks. Dashed horizontal lines indicate genome-wide significance thresholds for CWAS. Vertical dotted lines indicate the GWAS significance threshold of z = 5.45.
Extended Data Figure 7.
Extended Data Figure 7.
Enrichment of prostate cancer GWAS risk SNPs in genetically determined AR peaks (A) and H3K27ac peaks (B). Enrichment and p-values are derived from linkage disequilibrium score regression.
Extended Data Figure 8.
Extended Data Figure 8.. cQTL vs. eQTL activity at TMPRSS2 and NKX3-1 loci.
(A) Normalized AR ChIP-seq reads at the TMPRSS2 enhancer and TMPRSS2 expression stratified by genotype of the indicated SNP. (B) Normalized H3K27ac ChIP-seq reads at the NKX3-1 enhancer and NKX3-1 expression stratified by genotype of the indicated SNP. ρ and p-values indicates Pearson correlation coefficient for (A) and (B). (C) Estimated cis-SNP heritability for the indicated epigenomic features and corresponding genes. For boxplots, lower and upper hinges indicate 25th and 75th percentiles; whiskers extend to 1.5 x the inter-quartile ranges (IQR).
Extended Data Figure 9.
Extended Data Figure 9.. CWAS identifies associations not marked by a steady-state eQTL
(related to Figure 5). (A) Number of ENCODE samples (N=733, representing 438 cell types/states) with DNAse hypersensitivity at cQTL and eQTL SNPs. The data shown are from Fig. 5E. The scale is adjusted and mean ± standard error shown to better visualize differences between the groups. (B) Total AR or H3K2a7c peaks within 100kb of a gene as a function of prostate-specific gene expression, as quantified in Fig. 5F. (C) Portion of peaks in (B) with a CWAS model.
Figure 1.
Figure 1.. Overview of the method.
(A) Cistrome-wide association studies identify epigenomic features that are genetically associated with a trait. (B) Epigenomic sequencing reads (ChIP-seq and ATAC-seq) are merged on a per-individual basis and used to impute SNP genotypes. Haplotypes are then phased based on reference panels. Normalized read abundance and allele-specific reads at heterozygous SNPs are modeled as a function of cis-SNP genotypes. The resulting models capture the genetic determinants of peak intensity.
Figure 2.
Figure 2.. Genetic variation creates abundant chromatin QTLs and allelically imbalanced regulatory elements.
(A) Portion of all AR and H3K27ac peaks with evidence of genetic determination, defined as a significant combined test for allelic imbalance and cQTL with Q < 0.05 (methods). (B) cQTL effect size (β) versus allele fraction (μ) for peaks with allelic imbalance. μ for one SNP per peak is shown. ρ indicates Pearson correlation coefficient. (C) Overlap of allelically imbalanced (AI) and chromatin QTL (cQTL) peaks. (D) Overlap of genetically determined AR and H3K27ac peaks in (A). (E) Distance from the center of significant AR cQTL peaks (permutation-based q value < 0.05) to the corresponding SNP. Blue dashed lines mark ±200bp from the peak center. (F) For all heterozygous SNPs overlapping the indicated motif, the difference in the motif position weight matrix (PWM) score for alternate vs. reference alleles is plotted against the allele fraction observed in AR ChIP-seq reads. The top five motifs inferred de novo from 10,000 randomly selected AR binding peaks are shown. The NANOG motif (red) is included as a negative control. p-values for Pearson correlation are indicated.
Figure 3.
Figure 3.. Integrative cistrome models identify genetic determinants of gene regulation.
(A) Total peak intensity, allele-specific activity, or both are modeled based on cis-SNP genotypes. Models include either linear combinations of SNPs (“multi-SNP”), or the single most significantly predictive SNP (“top SNP”; methods). (B) In vitro validation of allelically imbalanced regulatory element SNPs. Regulatory elements containing SNPs were assessed for enhancer activity in vitro using SNP STARR-seq (Methods). Bar plots indicate reads from reference or alternate haplotypes in H3K27ac ChIP-seq data (orange) and normalized transcript counts for each SNP genotype from SNP STARR-seq (gray). p-values for allelic imbalance under the beta-binomial model are indicated (Methods) (C) Prostate cancer-associated ARBS (black triangle) upstream of TMPRSS2. (D) Effect on TMPRSS2 transcript expression with CRISPRi suppression of ARBSs shown in (C) (n=3 independent experiments). gNT and gCTRL indicate two non-targeting control guide RNAs. (E) Prostate cancer-associated ARBS (black triangle) within BMPR1B. (F) Effect on BMPR1B and PDLIM5 expression with CRISPRi suppression of ARBSs shown in (E) (n=3 independent experiments). For (D) and (F), error bars indicate median and range in (D) and (F); p-values are calculated with the Wilcoxon rank-sum test.
Figure 4.
Figure 4.. CWAS identifies prostate cancer risk mediated by genetic variation in AR binding and regulatory element activity.
(A) Manhattan plot showing significant genetic associations with prostate cancer for AR CWAS, H3K27ac CWAS, and TWAS. Red lines indicate genome-wide significance thresholds. (B) Normalized read counts at the indicated peaks stratified by genotype of the indicated SNP. Lower and upper hinges indicate 25th and 75th percentiles; whiskers extend to 1.5 x the inter-quartile ranges (IQR). p-values for Pearson correlation are indicated (C) GWAS SNP significance in the vicinity of the peaks shown in (H), with and without conditioning on genetically predicted activity. The CWAS peaks are marked by a black triangle.
Figure 5.
Figure 5.. CWAS identifies associations not marked by a steady-state eQTL.
(A) Prostate cancer risk loci were defined as genome-wide significant SNPs ± 1Mb and assessed for overlap with a high-confidence CWAS or TWAS peak. TWAS results using reference panels with only prostate tissue or all tissues are shown separately. (B) Estimated cis-SNP heritability for assessable genes (n=16,634), AR peaks (n=32,434), or H3K27ac peaks(n=54,262). (C) Distribution of heritability estimates for genes or AR peaks with significant heritability (q<0.05). (D) Steady-state chromatin measurements revealing context-dependent genetic effects on gene regulation. H3K27ac ChIP-seq, ATAC-seq, and RNA-seq data from LNCaP were generated at baseline and after 16 hours of stimulation with dihydrotestosterone (DHT) and assessed for allelic imbalance. Contingency tables show all transcripts that do not exhibit allelically imbalanced expression at baseline, stratified by (1) whether they demonstrate imbalanced expression with DHT treatment and (2) whether they are within 100kb of an ATAC-seq or H3K27ac peak with allelic imbalance at baseline. Odds ratio (OR) that a transcript with stimulation-induced imbalance falls within 100kb of a peak that is imbalanced at baseline, compared to transcripts without stimulation-induced imbalance. p-values from chi-square tests are indicated. (E) Number of ENCODE samples (n=733, representing 438 cell types/states) with DNAse hypersensitivity at cQTL SNPs (n=379 and 2,061 for AR and H3K27ac, respectively) and eQTL SNPs (n=2,884). (F) Number of genes with a TWAS model or AR/H3K27ac CWAS model (within 100kb) as a function of prostate-specific expression. Expression in prostate was compared to mean across all GTEx tissues to obtain a z-scores, which were binned by percentiles. (G) Percent of genes with TWAS models or CWAS models (within 100kb) for all genes (left) and the top percentile of prostate-specific expression (right). (H) Data from (F) grouped by enhancer domain score (EDS) percentile. (I) Percent of genes with TWAS models or nearby CWAS models for genes in the top EDS percentile. (J) Boxplots of EDS scores for genes (n=224) within central 100kb of the indicated category of GWAS risk regions. (K) Number of genes in indicated category of GWAS risk regions that encode TFs. p-value from chi-square test is indicated. (L) Model demonstrating how latent eQTLs are observable as steady-state cQTLs. p-values indicate Wilcoxon rank-sum tests for (B), (E), and (J). For boxplots, lower and upper hinges indicate 25th and 75th percentiles; whiskers extend to 1.5 x the inter-quartile ranges (IQR).
Figure 6.
Figure 6.. CWAS associations linked to selected prostate developmental genes and proto-oncogenes.
(A-F) Panels show the genomic context for CWAS ARBS or H3K27ac peaks near select genes with biological relevance to prostate cancer: NKX3-1 (A), GATA2 (B), HOXB13 (C), CCND1 (D), KLF5 (E), and MYC (F). For each panel, tracks from top to bottom show H3K27ac HiChIP loops in LNCaP (gray), Normalized read counts for H3K27ac (orange) or AR (purple) ChIP-seq in LNCaP, gene annotations, and significant CWAS H3K27ac peaks or CWAS ARBS (indicated by black triangles). The bottom track shows prostate cancer GWAS SNP significance in the vicinity of the CWAS peaks in gray, and the residual significance after conditioning upon the CWAS H3K27ac peak or ARBS in red. (G) cis-SNP heritability of indicated genes and CWAS peaks within the regions shown in A-F. Only CWAS peaks with significant cis-SNP heritability (p < 0.05) are shown.
Figure 7.
Figure 7.. CWAS identifies ARBS underlying heritability of multiple androgen-regulated phenotypes.
(A) AR CWAS was performed on GWAS for the indicated phenotypes. The absolute value of the effect size Z was calculated for ARBS associations and the top 100 are displayed for each phenotype. (B) Manhattan plot showing significance of ARBS associations with testosterone levels among individuals in the UK Biobank. (C) Epigenomic context of a significant CWAS ARBS for testosterone near YAP1. Tracks from top to bottom show H3K27ac HiChIP loops in LNCaP (gray), normalized AR ChIP-seq read counts in LNCaP (purple), gene annotations, and the location of the significant CWAS ARBS (black triangle). The bottom track shows testosterone GWAS SNP significance in the vicinity of the CWAS peaks in gray and the residual significance after conditioning upon predicted activity of the ARBS in red. (D) Manhattan plot showing significance of ARBS associations with BPH among individuals in the UK Biobank. (E) Epigenomic context of a significant CWAS ARBS for BPH near FGFR2. Tracks are as described for (B). (F) Epigenomic context of CWAS ARBS within NAALADL2 associated with response to androgen deprivation therapy among men with prostate cancer from a clinical trial. Met-ARBs (purple) signify AR binding sites that are enriched in metastatic castration-resistant prostate cancer compared to prostate-localized tumors. (G) Kaplan-Meier curve showing progression-free survival on androgen deprivation therapy stratified by patient genotype at rs936477, the SNP that determines intensity of the ARBS within NAALADL2 shown in (F).

References

    1. Maurano MT, Humbert R, Rynes E, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 2012;337(6099):1190–1195. doi: 10.1126/science.1222794 - DOI - PMC - PubMed
    1. Trynka G, Sandor C, Han B, et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet 2013;45(2):124–130. doi: 10.1038/ng.2504 - DOI - PMC - PubMed
    1. Pickrell JK. Joint Analysis of Functional Genomic Data and Genome-wide Association Studies of 18 Human Traits. Am J Hum Genet 2014;94(4):559–573. doi: 10.1016/j.ajhg.2014.03.004 - DOI - PMC - PubMed
    1. Welter D, MacArthur J, Morales J, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014;42(Database issue):D1001–D1006. doi: 10.1093/nar/gkt1229 - DOI - PMC - PubMed
    1. Finucane HK, Bulik-Sullivan B, Gusev A, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 2015;47(11):1228–1235. doi: 10.1038/ng.3404 - DOI - PMC - PubMed
Additional References
    1. Baca SC, Takeda DY, Seo JH, et al. Reprogramming of the FOXA1 cistrome in treatment-emergent neuroendocrine prostate cancer. Nature Communications 2021;12(1):1979. doi: 10.1038/s41467-021-22139-7 - DOI - PMC - PubMed
    1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25 - DOI - PMC - PubMed
    1. Zhang Y, Liu T, Meyer CA, et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biology 2008;9(9):R137. doi: 10.1186/gb-2008-9-9-r137 - DOI - PMC - PubMed
    1. Robinson JT, Thorvaldsdóttir H, Winckler W, et al. Integrative Genomics Viewer. Nat Biotechnol 2011;29(1):24–26. doi: 10.1038/nbt.1754 - DOI - PMC - PubMed
    1. Auton A, Abecasis GR, Altshuler DM, et al. A global reference for human genetic variation. Nature 2015;526(7571):68–74. doi: 10.1038/nature15393 - DOI - PMC - PubMed

Publication types