Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 11;14(1):2712.
doi: 10.1038/s41467-023-38272-4.

Alternative promoters in CpG depleted regions are prevalently associated with epigenetic misregulation of liver cancer transcriptomes

Affiliations

Alternative promoters in CpG depleted regions are prevalently associated with epigenetic misregulation of liver cancer transcriptomes

Chirag Nepal et al. Nat Commun. .

Abstract

Transcriptional regulation is commonly governed by alternative promoters. However, the regulatory architecture in alternative and reference promoters, and how they differ, remains elusive. In 100 CAGE-seq libraries from hepatocellular carcinoma patients, here we annotate 4083 alternative promoters in 2926 multi-promoter genes, which are largely undetected in normal livers. These genes are enriched in oncogenic processes and predominantly show association with overall survival. Alternative promoters are narrow nucleosome depleted regions, CpG island depleted, and enriched for tissue-specific transcription factors. Globally tumors lose DNA methylation. We show hierarchical retention of intragenic DNA methylation with CG-poor regions rapidly losing methylation, while CG-rich regions retain it, a process mediated by differential SETD2, H3K36me3, DNMT3B, and TET1 binding. This mechanism is validated in SETD2 knockdown cells and SETD2-mutated patients. Selective DNA methylation loss in CG-poor regions makes the chromatin accessible for alternative transcription. We show alternative promoters can control tumor transcriptomes and their regulatory architecture.

PubMed Disclaimer

Conflict of interest statement

J.B.A. declares consultancy roles for Flagship Pioneering, SEALD, and QED therapeutics. J.B.A. has received funding from Incyte. C.N. declares no competing interests.

Figures

Fig. 1
Fig. 1. Annotation of alternative promoters in hepatocellular carcinoma (HCC) patients.
a A schematic workflow to describe the mapping of CAGE-seq reads to define consensus transcript clusters (TCs) across the cohort. b Barplot shows the overlap of HCC TCs with the annotated FANTOM5 CAGE peaks and open chromatin peaks from ENCODE and TCGA. c A schematic workflow to annotate intragenic CAGE TCs as high-confidence alternative promoters. The workflow includes multiple filtering steps to exclude TCs that lack promoter features. d Distance between 5′ ends of novel TSSs and 5′ ends of RNA-seq and EST transcripts. e Classification of expressed genes into single promoter (SP) and multi-promoter (MP) genes based on the number of promoters. The promoter with the highest expression level (represented by arrow height) is assigned as the reference promoter. f Venn diagram shows the intersection of novel alternative promoters with known MP genes. g Enrichment of signature genes in MP genes compared to SP genes. P values were computed using a two-tailed Fisher’s exact test. h Distribution of survival-associated genes with SP and MP genes. The MP genes were significantly associated (P = 1.06E−247; Fisher’s exact test) with survival outcome. i The scatter plot shows the association of overall survival for reference and alternative promoters. P values were computed using the chi-squared test.
Fig. 2
Fig. 2. The impact of alternative promoters in gene expression regulation.
a A UCSC browser screenshot of the CTNNBL1 gene with CAGE tags across the normal liver, tumor-adjacent liver tissues, and HCCs. The alternative promoter is zoomed in to show the CTSS usage. b Percentage of the single promoter (SP) reference, multi-promoter (MP) reference, and alternative HCC promoters expressed across eight independent normal livers at different expression thresholds. HCC promoters are undetected in normal liver tissues and show tumor-specific activation of alternative promoters. c Enrichment of cancer hallmark terms associated with alternative promoters expressed in normal livers and those unexpressed in normal livers. P-values were computed using Fisher’s exact test and corrected for multiple testing. d Distribution of reference and alternative promoters expressed across the HCC cohort. The x-axis indicates the percentage at which a promoter is expressed across the HCC cohort. e The average expression level of reference (n = 2926) and alternative (n = 4083) promoters of multi-promoter (MP) genes and single promoter (SP) genes (n = 12,493). Boxplots show the 5th, 25th, 50th, 75th, and 95th percentiles, where the central line is the median. P-values were determined by two-tailed unpaired t-tests. f, g Volcano plots show differentially expressed promoters (f) and genes (g) between tumors and tumor-adjacent tissues. P-values were computed using the Wald test. The cut-off P-value of 0.05 was FDR-corrected. h Expression fold change for reference and alternative promoter pairs. i Barplot shows the fraction of differentially expressed promoters (from panel f) that are classified as either upregulated or downregulated in tumors compared to tumor-adjacent tissues. P value was computed using the chi-squared test. j The distribution of fold change of reference promoters based on the number of one or more alternative promoters. k The distribution of fold change of reference promoters based on the upstream or downstream position of alternative promoters relative to its reference promoter. The P-value was computed using the Kolmogorov–Smirnov test.
Fig. 3
Fig. 3. Promoter architecture of reference and alternative promoters.
a A UCSC browser screenshot of GNAS gene along with CpG island (CGI), CAGE-seq, and H3K4me3 tracks. The reference and alternative promoters have a shared CGI. b Number of promoters overlapping CGIs across single promoter (SP) reference (n = 9997), multi-promoter (MP) reference (n = 2290), and MP alternative (n = 1987) promoters. CGIs shared by reference (n = 1050) and alternative (n = 1170) promoters are highlighted in green. c Proportion of annotated and novel alternative promoters overlapping CGIs. P-value was determined using Fisher’s exact test. d Distribution of CG density across a known reference (n = 776), known alternative (n = 1481), and novel (n = 1387) nonCGI promoters. P-values were determined by two-tailed unpaired t-tests. Boxplots show the 5th, 25th, 50th, 75th, and 95th percentiles, where the central line is the median. e Distribution of promoter width of SP reference (n = 12,493), MP reference (n = 2926), and alternative (n = 4083) promoters. P-values were determined by two-tailed unpaired t-tests. Boxplots show the 5th, 25th, 50th, 75th, and 95th percentiles, where the center line is the median. f Barplot shows alternative promoters have a higher proportion of sharp promoter shape relative to reference promoters. g Sequence motifs around TSSs of sharp and broad promoters for reference and alternative promoters. h Barplot shows the fraction of HCC promoters that overlapped with TCGA pan-cancer ATAC-seq peaks. Promoters were classified into two groups based on their overlap with CGIs. i Heatmap shows transcription factor motifs enriched across reference and alternative promoters as well as their overlap with CGIs. j The average coverage of RNA polymerase II (Pol2), phosphorylation modification at serine 5 (Pol2-Ser5; initiation of Pol2) and serine 2 (Pol2-Ser2; elongation by Pol2) on the large subunit of Pol2 in HepG2 cells across the gene body.
Fig. 4
Fig. 4. The landscape of histone modifications around the reference and alternative promoters.
a Line plots show average histone modifications (H3K4me3, H3K27ac, H3K4me1, H3K27me3) levels of four HCC patients around reference (left panel) and alternative (right panel) promoters. Promoters are classified into three groups based on their overlap with CpG islands (CGIs). CGIs shared by reference and alternative promoters are classified as shared CGI. Heatmaps on the bottom show histone modifications for each promoter across three groups. b Enrichment of histone modifications (H3K4me3, H3K27ac, H3K4me1, H3K27me3) data in HepG2 cells, similar to that of the panel (a). c A UCSC browser screenshot of reference and alternative promoters overlapping promoter CGIs along with an intragenic CGI lacking CAGE tags. Line plots in the middle show the average coverage of histone marks (H3K4me3, H3K27ac, H3K4me1, H3K27me3) along promoter CGIs (left) and intragenic CGIs (right). CGIs of varying lengths are scaled between start and end. Heatmaps on the bottom show histone signals for CGIs. d Enrichment of histone modifications (H3K4me1, H3K4me3, H3K27ac, H3K27me3) on HepG2 enhancers overlapping CGIs (n = 791) and non-overlapping CGIs (n = 20891). CGIs and enhancers of varying lengths are scaled between start and end. Heatmaps on the bottom show histone signals for individual CGIs and enhancers.
Fig. 5
Fig. 5. CG density influences DNA methylation landscapes.
a Mean methylation levels (β values, top) around transcription start sites (TSSs) of reference and alternative promoters across the TCGA HCC cohort. Mean methylation levels of the TCGA HCC cohort are derived from 379 tumors and 51 tumor-adjacent tissues. Promoters overlapping with CpG islands (CGI) were separated from non-overlapping promoters. The scatter plots (bottom panel) show differentially hypermethylated (brown) and hypomethylated (black) CpGs in 500 nucleotides window around TSSs. P-values were determined by two-tailed unpaired t-tests between HCC tumors and tumor-adjacent tissues. P values were adjusted for multiple testing. b Mean methylation levels around TSSs of unexpressed genes across the TCGA HCC cohort. The scatter plots (bottom panel) show differentially hypermethylated (brown) and hypomethylated (black) CpGs in 500 nucleotides window around TSSs. P-values were determined by two-tailed unpaired t-tests between HCC tumors and tumor-adjacent tissues. P values were adjusted for multiple testing. c Mean methylation levels across TCGA tumors and tumor-adjacent tissues along gene bodies of multi-promoter (left panel) and single-promoter (right panel) genes. d Mean methylation levels along gene bodies of up/downregulated genes in tumor-adjacent tissues (left panel) and tumor tissues (right panel). e A UCSC browser screenshot showing promoter CGI, intragenic CGIs, and CAGE tags for the SKI gene. Zoomed view (top panels) shows the average methylation level of promoter CGIs, intragenic CGIs, and their flanking regions. CGIs of varying lengths are scaled between start and end. Zoomed view (bottom panels) shows the average demethylation (5hmC) levels. Mean methylation levels of the TCGA HCC cohort are derived from 379 tumors and 51 tumor-adjacent tissues. Mean methylation levels of the GSE112221 cohort are derived from 4 tumors and 4 tumor-adjacent tissues. f Coverage of DNMT3B binding on promoter CGIs and intragenic CGIs across human ES cells. g Coverage of TET1 binding on promoter and intragenic CGIs across human ES cells.
Fig. 6
Fig. 6. Regulation of SETD2.
a, b Average coverage of H3K36me3 along gene body and flanking regions of the single-promoter (SP) and multi-promoter (MP) genes in SETD2-wt (n = 2, replicates merged) and SETD2-kd (n = 2, replicates merged) in HepG2 cells. c The average coverage of H3K36me3 along intragenic CGIs and flanking regions in SETD2-wt and SETD2-kd in HepG2 cells. d Volcano plot shows hypermethylated and hypomethylated CpGs between SETD2-mutant (n = 15) and SETD2 wild-type (n = 362) tumors from TCGA HCC patients. P-values were determined by two-tailed unpaired t-tests between SETD2-mutant and SETD2 wild-type groups. P values were adjusted for multiple testing. Y axis indicates the negative log2 value of adjusted P values. e Boxplots show average DNA methylation levels (beta values) of CpGs around reference promoters, alternative promoters, and intragenic regions. Mean methylation levels are derived from 15 SETD2-mutant and 362 SETD2 wild-type TCGA HCC patients. Boxplots show the 5th, 25th, 50th, 75th, and 95th percentiles, where the center line is the median. P-values were determined by two-tailed unpaired t-tests. f Volcano plot shows fold-change of intronic reads in SETD2-mutant (n = 15) versus SETD2 wild type (n = 362). P-values were determined by a two-tailed unpaired t-test. g Schematic representation to illustrate tumor-specific transcription of alternative promoters from CG-poor regions. The chromatin structure of intragenic CG-rich and CG-poor regions have different distributions of 5mC, 5hmC, H3K36me3, and DNMT3B, leading to the pervasive initiation of alternative promoters from CG-poor regions.

References

    1. Sung H, et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Cancer Genome Atlas Research Network. Comprehensive and integrative genomic characterization of hepatocellular carcinoma. Cell. 2017;169:1327–1341 e23. doi: 10.1016/j.cell.2017.05.046. - DOI - PMC - PubMed
    1. Villanueva A, et al. DNA methylation-based prognosis and epidrivers in hepatocellular carcinoma. Hepatology. 2015;61:1945–1956. doi: 10.1002/hep.27732. - DOI - PubMed
    1. Schulze K, et al. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets. Nat. Genet. 2015;47:505–511. doi: 10.1038/ng.3252. - DOI - PMC - PubMed
    1. Fujimoto A, et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 2016;48:500–509. doi: 10.1038/ng.3547. - DOI - PubMed

Publication types