Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 2;84(1):133-153.
doi: 10.1158/0008-5472.CAN-23-1129.

Leveraging Tissue-Specific Enhancer-Target Gene Regulatory Networks Identifies Enhancer Somatic Mutations That Functionally Impact Lung Cancer

Affiliations

Leveraging Tissue-Specific Enhancer-Target Gene Regulatory Networks Identifies Enhancer Somatic Mutations That Functionally Impact Lung Cancer

Judith Mary Hariprakash et al. Cancer Res. .

Abstract

Enhancers are noncoding regulatory DNA regions that modulate the transcription of target genes, often over large distances along with the genomic sequence. Enhancer alterations have been associated with various pathological conditions, including cancer. However, the identification and characterization of somatic mutations in noncoding regulatory regions with a functional effect on tumorigenesis and prognosis remain a major challenge. Here, we present a strategy for detecting and characterizing enhancer mutations in a genome-wide analysis of patient cohorts, across three lung cancer subtypes. Lung tissue-specific enhancers were defined by integrating experimental data and public epigenomic profiles, and the genome-wide enhancer-target gene regulatory network of lung cells was constructed by integrating chromatin three-dimensional architecture data. Lung cancers possessed a similar mutation burden at tissue-specific enhancers and exons but with differences in their mutation signatures. Functionally relevant alterations were prioritized on the basis of the pathway-level integration of the effect of a mutation and the frequency of mutations on individual enhancers. The genes enriched for mutated enhancers converged on the regulation of key biological processes and pathways relevant to tumor biology. Recurrent mutations in individual enhancers also affected the expression of target genes, with potential relevance for patient prognosis. Together, these findings show that noncoding regulatory mutations have a potential relevance for cancer pathogenesis and can be exploited for patient classification.

Significance: Mapping enhancer-target gene regulatory interactions and analyzing enhancer mutations at the level of their target genes and pathways reveal convergence of recurrent enhancer mutations on biological processes involved in tumorigenesis and prognosis.

PubMed Disclaimer

Figures

Figure 1. Methodological framework overview. Schematic illustration of our workflow for enhancer mutation characterization. A, Lung-specific enhancer definition from eight different lung cell and tissue types. ChIP-seq for open chromatin (H3K27ac) and chromatin accessibility (DNase-seq or ATAC-seq) from each sample are intersected to obtain cell-specific putative enhancers. The union of the regions from each cell creates the master list after removing regions overlapping with promoters and exons. *, in the cell line, indicates the in-house data. B, Somatic mutation calling. Whole-genome sequencing data of tumor and corresponding normal blood of three lung cancer cohorts viz., LUAD, LUSC, and SCLC obtained from public resources are processed using an ensemble mutation calling approach to identify somatic mutations. C, Enhancer–target gene prediction using canonical correlation of functional genomics data to investigate the synchronized activity of enhancer–promoter pairs across multiple cell types. Implementation of the 3D colocalization information encoded in hierarchical contact score to control for FDR in multiple testing hypotheses. D, Reconstructed ETG network with somatic mutations in enhancers. Lung-specific enhancer regulatory network reconstructed with somatic mutations at enhancers obtained through steps A, B, and C. Dark cyan circles represent enhancers, red lightning marks represent mutations, and colored ovals represent genes. E, Aggregation of enhancer mutations at the pathway level. Pathway level enrichment of TGEM is performed using three approaches, that is, over-representation analysis to determine biological pathways with TGEM enrichment, direct heat diffusion to determine significantly affected sub-networks, and a global test to assess the effect of TGEM on gene expression at the pathway level. F, Functional analysis to characterize recurrently mutated enhancers. Recurrently mutated enhancer cores are determined for the functional characterization of a relevant enhancer mutation. The effect of the enhancer mutation on the target gene expression is assessed by stratifying patients based on the presence of the enhancer mutation. Similarly, survival probability is estimated in patients stratified on the basis of the presence of the enhancer mutation. TFBS alteration in the enhancer core upon somatic mutation is assessed to determine the mechanistic effect of enhancer mutation. (Created with BioRender.com.)
Figure 1.
Methodological framework overview. Schematic illustration of our workflow for enhancer mutation characterization. A, Lung-specific enhancer definition from eight different lung cell and tissue types. ChIP-seq for open chromatin (H3K27ac) and chromatin accessibility (DNase-seq or ATAC-seq) from each sample are intersected to obtain cell-specific putative enhancers. The union of the regions from each cell creates the master list after removing regions overlapping with promoters and exons. *, in the cell line, indicates the in-house data. B, Somatic mutation calling. Whole-genome sequencing data of tumor and corresponding normal blood of three lung cancer cohorts viz., LUAD, LUSC, and SCLC obtained from public resources are processed using an ensemble mutation calling approach to identify somatic mutations. C, Enhancer–target gene prediction using canonical correlation of functional genomics data to investigate the synchronized activity of enhancer–promoter pairs across multiple cell types. Implementation of the 3D colocalization information encoded in hierarchical contact score to control for FDR in multiple testing hypotheses. D, Reconstructed ETG network with somatic mutations in enhancers. Lung-specific enhancer regulatory network reconstructed with somatic mutations at enhancers obtained through steps A, B, and C. Dark cyan circles represent enhancers, red lightning marks represent mutations, and colored ovals represent genes. E, Aggregation of enhancer mutations at the pathway level. Pathway level enrichment of TGEM is performed using three approaches, that is, over-representation analysis to determine biological pathways with TGEM enrichment, direct heat diffusion to determine significantly affected sub-networks, and a global test to assess the effect of TGEM on gene expression at the pathway level. F, Functional analysis to characterize recurrently mutated enhancers. Recurrently mutated enhancer cores are determined for the functional characterization of a relevant enhancer mutation. The effect of the enhancer mutation on the target gene expression is assessed by stratifying patients based on the presence of the enhancer mutation. Similarly, survival probability is estimated in patients stratified on the basis of the presence of the enhancer mutation. TFBS alteration in the enhancer core upon somatic mutation is assessed to determine the mechanistic effect of enhancer mutation. (Created with BioRender.com.)
Figure 2. Mutational landscape of lung cancer cohort. A, Circos plot of the global landscape of mutations in patients with lung cancer. Chromosomes are shown on the outermost circle. The following circle is a bar graph of gene density obtained by binning the genome in 1 Mbp windows (dark blue). The next circles from the periphery to the center are the bar graphs representing the number of enhancers (dark cyan), promoters (salmon pink), and exons (powder blue) mutated (log-scale). The scale of each bar graph is represented at the start of chromosome1. Mutations in the noncanonical chromosome (chromosome Y) were removed from the analysis. B, Sample-wise mutation distribution. The bottom shows the line plot representing the mean somatic mutation per Mb in the lung cancer sample. The middle shows the relative proportions in the percentage of the six possible base-pair substitutions, as indicated in the legend on the right. The top shows the stacked bar plot depicting the number of mutated genomic elements. Each bar represents the total number of enhancer mutations (dark cyan), promoter mutations (salmon pink), and exon mutations (powder blue) for a patient. Samples are sorted on the basis of the total number of mutations in exons (x-axis). C, Mutation burden comparisons. Scatter plots showing the mutation burden per Mb in enhancers (x-axis) and exons (y-axis). D, Scatter plots showing the mutation burden per Mb in enhancers (x-axis) and the rest of the noncoding region (RNCR; y-axis). Each blue dot in scatter plots represents a sample, the gray line represents the bisectors, the blue line represents the line of regression, and the slope of the regression is mentioned in the plot.
Figure 2.
Mutational landscape of lung cancer cohort. A, Circos plot of the global landscape of mutations in patients with lung cancer. Chromosomes are shown on the outermost circle. The following circle is a bar graph of gene density obtained by binning the genome in 1 Mbp windows (dark blue). The next circles from the periphery to the center are the bar graphs representing the number of enhancers (dark cyan), promoters (salmon pink), and exons (powder blue) mutated (log-scale). The scale of each bar graph is represented at the start of chromosome1. Mutations in the noncanonical chromosome (chromosome Y) were removed from the analysis. B, Sample-wise mutation distribution. The bottom shows the line plot representing the mean somatic mutation per Mb in the lung cancer sample. The middle shows the relative proportions in the percentage of the six possible base-pair substitutions, as indicated in the legend on the right. The top shows the stacked bar plot depicting the number of mutated genomic elements. Each bar represents the total number of enhancer mutations (dark cyan), promoter mutations (salmon pink), and exon mutations (powder blue) for a patient. Samples are sorted on the basis of the total number of mutations in exons (x-axis). C, Mutation burden comparisons. Scatter plots showing the mutation burden per Mb in enhancers (x-axis) and exons (y-axis). D, Scatter plots showing the mutation burden per Mb in enhancers (x-axis) and the rest of the noncoding region (RNCR; y-axis). Each blue dot in scatter plots represents a sample, the gray line represents the bisectors, the blue line represents the line of regression, and the slope of the regression is mentioned in the plot.
Figure 3. Mutation signatures comparison. A, Mutation signatures in lung cancer cohort. Heat map of the relative contribution of each COSMIC single-base substitutions (SBS) signature for each sample. The samples are grouped on the basis of the lung cancer subtype indicated by the color band (orange, SCLC; purple, LUSC; and green, LUAD). B, Mutation signature difference between coding and the noncoding genome. Comparison of underlying signature distribution between coding and noncoding regions in LUAD, LUSC, and SCLC for a subset of COSMIC SBS signatures. For a given signature, the color of the marker corresponds to the difference between the mean contribution in coding and the noncoding region. The size represents the P value (Wilcoxon rank-sum test). Only the subset of signatures with significant contribution differences in at least one of the cohorts (P < 0.05) are reported. C, Mutation signature associated with the enhancers, promoters, and exons. Box and whiskers plots of the relative contribution of mutation signature in enhancers (dark cyan), promoters (salmon pink), and exons (powder blue). The statistical significance of comparisons (P value < 0.05) is presented as star marks. Each point in the boxplot represents a sample.
Figure 3.
Mutation signatures comparison. A, Mutation signatures in lung cancer cohort. Heat map of the relative contribution of each COSMIC single-base substitutions (SBS) signature for each sample. The samples are grouped on the basis of the lung cancer subtype indicated by the color band (orange, SCLC; purple, LUSC; and green, LUAD). B, Mutation signature difference between coding and the noncoding genome. Comparison of underlying signature distribution between coding and noncoding regions in LUAD, LUSC, and SCLC for a subset of COSMIC SBS signatures. For a given signature, the color of the marker corresponds to the difference between the mean contribution in coding and the noncoding region. The size represents the P value (Wilcoxon rank-sum test). Only the subset of signatures with significant contribution differences in at least one of the cohorts (P < 0.05) are reported. C, Mutation signature associated with the enhancers, promoters, and exons. Box and whiskers plots of the relative contribution of mutation signature in enhancers (dark cyan), promoters (salmon pink), and exons (powder blue). The statistical significance of comparisons (P value < 0.05) is presented as star marks. Each point in the boxplot represents a sample.
Figure 4. Enhancer–target gene pairing. A, Distance between enhancer and predicted target gene. The x-axis denotes the distance (in Kb) between the enhancer and the predicted target gene, and the y-axis denotes the number of ETG pairs in the distance range. Each bar represents the total number of ETG pairs (in thousands) within the distance range per chromosome, as indicated in the legend on the right. B, Hierarchical contact score and the distance between ETG pairs. Bubble plot representing the number of ETG pairs (color) with HC score (y-axis) and the distance between the enhancer and the predicted target gene in Kb (x-axis). C, Number of enhancers versus number of mutated samples. Heat map showing the number of enhancers predicted for a gene (x-axis) compared with the number of enhancers mutated (y-axis). The color of the square indicates the number of genes with x number of enhancers and y number of mutated samples. D, Differential gene expression between genes with enhancer mutations. The volcano plot displays the log2-fold change expression (x-axis) between samples grouped by the specific enhancer mutation status of each gene (with vs. without enhancer mutations). Transcripts with significant difference and log2-fold change >1 are highlighted in pink, or log2-fold change < −1 are highlighted in violet. The y-axis is the −log10(P value) of the coefficient for enhancer mutations in the linear regression model. The horizontal red line marks the P value of <0.05 significance threshold. The size of the circles for significantly altered genes indicates the number of associated enhancers mutated. E, Transcription factor–binding sites at enhancer cores. Scatter plot showing the TF motif score computed by FIMO at enhancer core with the reference sequence (x-axis) and altered sequence (y-axis).
Figure 4.
Enhancer–target gene pairing. A, Distance between enhancer and predicted target gene. The x-axis denotes the distance (in Kb) between the enhancer and the predicted target gene, and the y-axis denotes the number of ETG pairs in the distance range. Each bar represents the total number of ETG pairs (in thousands) within the distance range per chromosome, as indicated in the legend on the right. B, Hierarchical contact score and the distance between ETG pairs. Bubble plot representing the number of ETG pairs (color) with HC score (y-axis) and the distance between the enhancer and the predicted target gene in Kb (x-axis). C, Number of enhancers versus number of mutated samples. Heat map showing the number of enhancers predicted for a gene (x-axis) compared with the number of enhancers mutated (y-axis). The color of the square indicates the number of genes with x number of enhancers and y number of mutated samples. D, Differential gene expression between genes with enhancer mutations. The volcano plot displays the log2-fold change expression (x-axis) between samples grouped by the specific enhancer mutation status of each gene (with vs. without enhancer mutations). Transcripts with significant difference and log2-fold change >1 are highlighted in pink, or log2-fold change < −1 are highlighted in violet. The y-axis is the −log10(P value) of the coefficient for enhancer mutations in the linear regression model. The horizontal red line marks the P value of <0.05 significance threshold. The size of the circles for significantly altered genes indicates the number of associated enhancers mutated. E, Transcription factor–binding sites at enhancer cores. Scatter plot showing the TF motif score computed by FIMO at enhancer core with the reference sequence (x-axis) and altered sequence (y-axis).
Figure 5. Pathway level enrichment of enhancer mutations. A, Scatter plot shows the over-representation of genes with enhancer mutations in the KEGG pathway. The x-axis represents the ratio of the overlapping genes to the total number of genes in the pathway. The size of the circle denotes the number of genes in overlap, and the color shows the negative logarithmic adjusted P value. B, Mutational landscape of the PI3K–AKT pathway. Co-mutation plot showing druggable PI3K–AKT signaling pathway genes (y-axis) affected in lung cancer samples (x-axis) by mutations in enhancers, promoters, and exons as indicated in the legend on the right. The top stacked bar plot shows the number of mutations in each sample, and the gene-wise mutation rate is displayed on the right. C, Network view of protein interactions among two sub-networks of regulation of transcription by RNA Pol II identified by HotNet2. Interactions between proteins in the sub-network from each interaction network are colored on the basis of a P value. D, Significantly altered gene sets (MSigDB-C2 curated). The violin plot shows the gene sets that are affected by TGEM. E, Hierarchical clustering graph of the WAKASUGI_HAVE_ZNF143_BINDING_SITES gene set from MSigDB C2 gene set. The gene plot shows the P value associated with the impact of the enhancer mutation on the gene expression. The part of the clustering graph with a significant P value is plotted in black.
Figure 5.
Pathway level enrichment of enhancer mutations. A, Scatter plot shows the over-representation of genes with enhancer mutations in the KEGG pathway. The x-axis represents the ratio of the overlapping genes to the total number of genes in the pathway. The size of the circle denotes the number of genes in overlap, and the color shows the negative logarithmic adjusted P value. B, Mutational landscape of the PI3K–AKT pathway. Co-mutation plot showing druggable PI3K–AKT signaling pathway genes (y-axis) affected in lung cancer samples (x-axis) by mutations in enhancers, promoters, and exons as indicated in the legend on the right. The top stacked bar plot shows the number of mutations in each sample, and the gene-wise mutation rate is displayed on the right. C, Network view of protein interactions among two sub-networks of regulation of transcription by RNA Pol II identified by HotNet2. Interactions between proteins in the sub-network from each interaction network are colored on the basis of a P value. D, Significantly altered gene sets (MSigDB-C2 curated). The violin plot shows the gene sets that are affected by TGEM. E, Hierarchical clustering graph of the WAKASUGI_HAVE_ZNF143_BINDING_SITES gene set from MSigDB C2 gene set. The gene plot shows the P value associated with the impact of the enhancer mutation on the gene expression. The part of the clustering graph with a significant P value is plotted in black.
Figure 6. CDH13 insertion variation. A, CDH13 enhancer loci at chromosome 16. The top shows the hierarchical TAD boundaries in the lung tissue within the locus 82.6 Mb to 83.3 Mb in chromosome 16. The next is the zoomed-in version between chr16: 82.6 Mb to 82.8 Mb, showing the location of the CDH13 promoter (salmon) and enhancers associated with CDH13 (dark cyan). The next shows the enhancer of interest (chr16:82.6718Mb–82.6728Mb) with enhancer mutations (red lollipop)—the height of the lollipop reflects the number of mutations found across the patient cohort. The last shows the enhancer cores in blue, with the most mutated enhancer core (CIEN-core) in dark blue. B, CIEN-Ins and patient clinical information. Co-mutation plot shows CDH13 expression [log2(TPM+1)], CIEN-Ins, presence of CIEN-Ins in tumor tissue, presence of CIEN-Ins in matched normal, source of normal (blood or tissue), copy-number variation (CNV; as neutral N, loss L, and gain G), exon mutation (number of mutations), promoter methylation (beta values), TNM staging of cancer, sex of the patient, and the lung cancer subtype are represented by indicated colors in the legend on the right. C, CDH13 expression upon CIEN-Ins deletion. CDH13 gene expression relative to β-actin in wild-type and homozygous deletion of CIEN-Ins in NCI-H460 cell line. The dots represent biological replicates (n = 5). D, Progression-free survival interval probability in lung cancer samples (LUAD+LUSC). E, Disease-free survival interval (DFI) probability in lung cancer samples. F, DFI probability in LUSC samples. For the survival analysis, patients were stratified on the basis of the presence of CIEN-Ins in the tumor, and Kaplan–Meier curves were plotted for the two groups. The risk table with the number of patients is described at the bottom. Differences between the two groups were evaluated using a log‐rank test.
Figure 6.
CDH13 insertion variation. A, CDH13 enhancer loci at chromosome 16. The top shows the hierarchical TAD boundaries in the lung tissue within the locus 82.6 Mb to 83.3 Mb in chromosome 16. The next is the zoomed-in version between chr16: 82.6 Mb to 82.8 Mb, showing the location of the CDH13 promoter (salmon) and enhancers associated with CDH13 (dark cyan). The next shows the enhancer of interest (chr16:82.6718Mb–82.6728Mb) with enhancer mutations (red lollipop)—the height of the lollipop reflects the number of mutations found across the patient cohort. The last shows the enhancer cores in blue, with the most mutated enhancer core (CIEN-core) in dark blue. B, CIEN-Ins and patient clinical information. Co-mutation plot shows CDH13 expression [log2(TPM+1)], CIEN-Ins, presence of CIEN-Ins in tumor tissue, presence of CIEN-Ins in matched normal, source of normal (blood or tissue), copy-number variation (CNV; as neutral N, loss L, and gain G), exon mutation (number of mutations), promoter methylation (beta values), TNM staging of cancer, sex of the patient, and the lung cancer subtype are represented by indicated colors in the legend on the right. C,CDH13 expression upon CIEN-Ins deletion. CDH13 gene expression relative to β-actin in wild-type and homozygous deletion of CIEN-Ins in NCI-H460 cell line. The dots represent biological replicates (n = 5). D, Progression-free survival interval probability in lung cancer samples (LUAD+LUSC). E, Disease-free survival interval (DFI) probability in lung cancer samples. F, DFI probability in LUSC samples. For the survival analysis, patients were stratified on the basis of the presence of CIEN-Ins in the tumor, and Kaplan–Meier curves were plotted for the two groups. The risk table with the number of patients is described at the bottom. Differences between the two groups were evaluated using a log‐rank test.

References

    1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209–49. - PubMed
    1. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature 2013;500:415–21. - PMC - PubMed
    1. Frankell AM, Dietzen M, Al Bakir M, Lim EL, Karasaki T, Ward S, et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature 2023;616:525–33. - PMC - PubMed
    1. Collisson EA, Campbell JD, Brooks AN, Berger AH, Lee W, Chmielecki J, et al. Comprehensive molecular profiling of lung adenocarcinoma. Nature 2014;511:543–50. - PMC - PubMed
    1. Hammerman PS, Lawrence MS, Voet D, Jing R, Cibulskis K, Sivachenko A, et al. Comprehensive genomic characterization of squamous cell lung cancers. Nature 2012;489:519–25. - PMC - PubMed

Publication types