Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan;20(1):102-113.
doi: 10.1158/1541-7786.MCR-21-0471. Epub 2021 Sep 23.

Mutations in Noncoding Cis-Regulatory Elements Reveal Cancer Driver Cistromes in Luminal Breast Cancer

Affiliations

Mutations in Noncoding Cis-Regulatory Elements Reveal Cancer Driver Cistromes in Luminal Breast Cancer

Samah El Ghamrasni et al. Mol Cancer Res. 2022 Jan.

Abstract

Whole-genome sequencing of primary breast tumors enabled the identification of cancer driver genes and noncoding cancer driver plexuses from somatic mutations. However, differentiating driver from passenger events among noncoding genetic variants remains a challenge. Herein, we reveal cancer-driver cis-regulatory elements linked to transcription factors previously shown to be involved in development of luminal breast cancers by defining a tumor-enriched catalogue of approximately 100,000 unique cis-regulatory elements from 26 primary luminal estrogen receptor (ER)+ progesterone receptor (PR)+ breast tumors. Integrating this catalog with somatic mutations from 350 publicly available breast tumor whole genomes, we uncovered cancer driver cistromes, defined as the sum of binding sites for a transcription factor, for ten transcription factors in luminal breast cancer such as FOXA1 and ER, nine of which are essential for growth in breast cancer with four exclusive to the luminal subtype. Collectively, we present a strategy to find cancer driver cistromes relying on quantifying the enrichment of noncoding mutations over cis-regulatory elements concatenated into a functional unit. IMPLICATIONS: Mapping the accessible chromatin of luminal breast cancer led to discovery of an accumulation of mutations within cistromes of transcription factors essential to luminal breast cancer. This demonstrates coopting of regulatory networks to drive cancer and provides a framework to derive insight into the noncoding space of cancer.

PubMed Disclaimer

Figures

Figure 1. Identifying chromatin accessibility in ER+PR+ breast cancer. A, Primary tumors were minced and dissociated for subsequent flow sorting into immune and epithelial cell populations, followed by ATAC-seq profiling. B, Heatmap showing similarities between ER+PR+ accessible chromatin profiles. Cosine similarity analysis was calculated using comparing all chromatin accessibility of samples with each other. Bar plot showing number of called peaks per sample. C, Bar plot showing the number of accessible chromatin regions from TCGA_Lum datasets that overlapped PM_Lum in blue and the ones unique to each cohort in red. D, A graph showing the chromatin accessibility saturation curve. A nonlinear regression model analysis was performed using the number of unique ATAC peaks discovered in each sample to estimate the percentage of accessible chromatin mapped in PM_Lum (Purple; n = 26 samples), TCGA_Lum (Blue; n = 41 samples), and luminal cell lines (MCF7 and T47D, Orange; n = 2). E, Percentage of distribution of mapped accessible chromatin regions within the genome. The cis-regulatory element annotation system (CEAS) is utilized to perform genomic distribution analysis of the accessible chromatin region mapped by ATAC-seq. **p value < 0.001, two-sided t test; The box plot ranges are Q1, Median, and Q3; the whiskers are ± 1.5x the IQR. F, Bar plot showing p values for cosine similarities between PM_Lum and TCGA_Lum in comparison with immune cells' accessible chromatin. Red dotted line represents p value = 0.01, two-sided t test. G, Lollipop graph showing enriched motif families in ER+PR+ breast tumors (p value < 0.01, Fisher exact test). The catalog of 26 ATAC-seq data was used. Enrichment of motifs within ATAC-seq regions against DNaseI hypersensitive sites from several cell lines was computed. Motif families were obtained using the Jaspar database. The size of the circles represents the number of target peaks for each motif.
Figure 1.
Identifying chromatin accessibility in ER+PR+ breast cancer. A, Primary tumors were minced and dissociated for subsequent flow sorting into immune and epithelial cell populations, followed by ATAC-seq profiling. B, Heatmap showing similarities between ER+PR+ accessible chromatin profiles. Cosine similarity analysis was calculated using comparing all chromatin accessibility of samples with each other. Bar plot showing number of called peaks per sample. C, Bar plot showing the number of accessible chromatin regions from TCGA_Lum datasets that overlapped PM_Lum in blue and the ones unique to each cohort in red. D, A graph showing the chromatin accessibility saturation curve. A nonlinear regression model analysis was performed using the number of unique ATAC peaks discovered in each sample to estimate the percentage of accessible chromatin mapped in PM_Lum (Purple; n = 26 samples), TCGA_Lum (Blue; n = 41 samples), and luminal cell lines (MCF7 and T47D, Orange; n = 2). E, Percentage of distribution of mapped accessible chromatin regions within the genome. The cis-regulatory element annotation system (CEAS) is utilized to perform genomic distribution analysis of the accessible chromatin region mapped by ATAC-seq. **p value < 0.001, two-sided t test; The box plot ranges are Q1, Median, and Q3; the whiskers are ± 1.5x the IQR. F, Bar plot showing p values for cosine similarities between PM_Lum and TCGA_Lum in comparison with immune cells' accessible chromatin. Red dotted line represents p value = 0.01, two-sided t test. G, Lollipop graph showing enriched motif families in ER+PR+ breast tumors (p value < 0.01, Fisher exact test). The catalog of 26 ATAC-seq data was used. Enrichment of motifs within ATAC-seq regions against DNaseI hypersensitive sites from several cell lines was computed. Motif families were obtained using the Jaspar database. The size of the circles represents the number of target peaks for each motif.
Figure 2. Mutation enrichment at cis-regulatory elements in ER+PR+ breast cancer. A, Box plot showing the percentage of regions from PM_Lum catalog overlapping mutation calls from WGS from multiple cancer types. The box plot ranges are Q1, Median, and Q3; The whiskers are ± 1.5x the IQR. B and C, Manhattan plots indicating regulatory regions significantly enriched in mutations using our in-house algorithm. The PM_Lum catalogue was used as accessible chromatin targets and the ICGC_EU WGS (B) or ICGC_US (C) was used as mutation calls. Dotted lines indicate q = < 0.01, exact binomial test.
Figure 2.
Mutation enrichment at cis-regulatory elements in ER+PR+ breast cancer. A, Box plot showing the percentage of regions from PM_Lum catalog overlapping mutation calls from WGS from multiple cancer types. The box plot ranges are Q1, Median, and Q3; The whiskers are ± 1.5x the IQR. B and C, Manhattan plots indicating regulatory regions significantly enriched in mutations using our in-house algorithm. The PM_Lum catalogue was used as accessible chromatin targets and the ICGC_EU WGS (B) or ICGC_US (C) was used as mutation calls. Dotted lines indicate q = < 0.01, exact binomial test.
Figure 3. Mutation analysis at recognition sites of motifs enriched in ER+PR+ breast cancer. A, Lollipop graph showing enriched motif families in PM_Lum catalog overlapping SNVs from ICGC-EU (red) and ICGC-US (blue) against the total PM_Lum catalog (p value < 0.01; grey: p value > 0.01, Fisher exact test). B and C, graph (top) and heatmaps (bottom) showing the enrichment of mutations at DNA recognition sites found to be significantly enriched in the PM_Lum catalog using ICGC-EU (B) and ICGC-US (C) mutation calls. Cohen D was calculated based on resampling and the value indicates significant enrichment. The red dotted line indicates Cohen D median.
Figure 3.
Mutation analysis at recognition sites of motifs enriched in ER+PR+ breast cancer. A, Lollipop graph showing enriched motif families in PM_Lum catalog overlapping SNVs from ICGC-EU (red) and ICGC-US (blue) against the total PM_Lum catalog (p value < 0.01; grey: p value > 0.01, Fisher exact test). B and C, graph (top) and heatmaps (bottom) showing the enrichment of mutations at DNA recognition sites found to be significantly enriched in the PM_Lum catalog using ICGC-EU (B) and ICGC-US (C) mutation calls. Cohen D was calculated based on resampling and the value indicates significant enrichment. The red dotted line indicates Cohen D median.
Figure 4. High enrichment of mutations at cistromes of key transcription factors involved in ER+PR+ breast cancer. Heatmaps showing enrichment of mutations at ChIP-seq peak centers and flanking regions (0–1000 bp) using ICGC-EU WGS dataset (A), transcription factor–binding sets using ICGC-EU (B), and ICGC-US WGS datasets (C). Cohen D was calculated based on resampling and the value indicates significant enrichment [Enrichment > Median (Cohen D)]. Transcription factors showing a consensus in mutation enrichment are marked in bold.
Figure 4.
High enrichment of mutations at cistromes of key transcription factors involved in ER+PR+ breast cancer. Heatmaps showing enrichment of mutations at ChIP-seq peak centers and flanking regions (0–1000 bp) using ICGC-EU WGS dataset (A), transcription factor–binding sets using ICGC-EU (B), and ICGC-US WGS datasets (C). Cohen D was calculated based on resampling and the value indicates significant enrichment [Enrichment > Median (Cohen D)]. Transcription factors showing a consensus in mutation enrichment are marked in bold.
Figure 5. Cancer driver cistromes are of transcription factors essential to luminal breast tumors. A heatmap showing the probability of the essentiality of the transcription factor in several breast cancer cell lines with different subtypes (Luminal, triple-negative breast cancers (TNBC), and HER2). Column annotation indicates the enrichment of mutations at binding sites ± 50 bp, and rows annotation shows cell line subtype.
Figure 5.
Cancer driver cistromes are of transcription factors essential to luminal breast tumors. A heatmap showing the probability of the essentiality of the transcription factor in several breast cancer cell lines with different subtypes (Luminal, triple-negative breast cancers (TNBC), and HER2). Column annotation indicates the enrichment of mutations at binding sites ± 50 bp, and rows annotation shows cell line subtype.

Similar articles

Cited by

References

    1. Aversa C, Rossi V, Geuna E, Martinello R, Milani A, Redana S, et al. . Metastatic breast cancer subtypes and central nervous system metastases. Breast 2014;23:623–8. - PubMed
    1. Fragomeni SM, Sciallis A, Jeruss JS. Molecular subtypes and local-regional control of breast cancer. Surg Oncol Clin N Am 2018;27:95–120. - PMC - PubMed
    1. Harvey JM, Clark GM, Osborne CK, Allred DC. Estrogen receptor status by immunohistochemistry is superior to the ligand-binding assay for predicting response to adjuvant endocrine therapy in breast cancer. J Clin Oncol 1999;17:1474–81. - PubMed
    1. Nik-Zainal S, Davies H, Staaf J, Ramakrishna M, Glodzik D, Zou X, et al. . Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 2016;534:47–54. - PMC - PubMed
    1. Martínez-Jiménez F, Muiños F, Sentís I, Deu-Pons J, Reyes-Salazar I, Arnedo-Pac C, et al. . A compendium of mutational cancer driver genes. Nat Rev Cancer 2020;20:555–72. - PubMed

Publication types