Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 8;376(6589):eabg5601.
doi: 10.1126/science.abg5601. Epub 2022 Apr 8.

Genome-wide analysis of somatic noncoding mutation patterns in cancer

Affiliations

Genome-wide analysis of somatic noncoding mutation patterns in cancer

Felix Dietlein et al. Science. .

Abstract

We established a genome-wide compendium of somatic mutation events in 3949 whole cancer genomes representing 19 tumor types. Protein-coding events captured well-established drivers. Noncoding events near tissue-specific genes, such as ALB in the liver or KLK3 in the prostate, characterized localized passenger mutation patterns and may reflect tumor-cell-of-origin imprinting. Noncoding events in regulatory promoter and enhancer regions frequently involved cancer-relevant genes such as BCL6, FGFR2, RAD51B, SMC6, TERT, and XBP1 and represent possible drivers. Unlike most noncoding regulatory events, XBP1 mutations primarily accumulated outside the gene's promoter, and we validated their effect on gene expression using CRISPR-interference screening and luciferase reporter assays. Broadly, our study provides a blueprint for capturing mutation events across the entire genome to guide advances in biological discovery, therapies, and diagnostics.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Genome-wide analysis of somatic mutation events in whole cancer genomes. (A) Genome-wide detection of somatic mutation events in whole cancer genome sequencing data. Step 1 combines three complementary test strategies. Step 2 integrates the results of tests 1 to 3 into a joint, genome-wide signal and identifies significant mutation events. Step 3 classifies mutation events according to their genomic location. (B and C) Top: Boxplots comparing mutation rates of a representative cancer type (lung cancer) against epigenomic signals [(B), the rationale of test 1] and mutation rates of other cancer types [(C), the rationale of test 2]. Boxes indicate 25/75% interquartile ranges, vertical lines extend to 10/90% percentiles, and horizontal lines reflect distribution medians. Bottom: Observed (teal dots) and predicted (continuous line) mutation rates (10-kb intervals) plotted against their position on chromosome 1 (function smoothed by Gaussian kernel). (D and E) Q-Q plots comparing observed (y-axis) and expected (x-axis) P values for test 1 (D) and test 2 (E).
FIG. 2.
FIG. 2.
Mutation events identified in a genome-wide analysis of the PCAWG and HMF consortia. (A and B) Top: Pie charts showing the number of mutation events per category (purple: coding, orange: regulatory, teal: tissue-specific, gray: other) in aggregate (A) and individual cancer types (B). Bottom: Genomic positions (y-axis) plotted against their significance in a genome-wide analysis (x-axis) and colored by categories (B). The position (y-axis) of findings recurring in more than one cancer type is plotted against the number of cancer types (x-axis) (A). NEAT1 and MALAT1 are marked by asterisks because their classification was ambiguous. (C) Mutation events sorted by their significance in a genome-wide analysis (x-axis, orange) and plotted against the number of findings involving known cancer genes (y-axis, top). Random overlap between findings and cancer genes serves as a negative control (purple).
FIG. 3.
FIG. 3.
Categories of mutation events exhibit different mutation patterns. Positional clustering of mutations (y-axis, percentage of maximum) is plotted against genomic positions (x-axis) around mutation events that fall into regulatory regions [(A), orange] or overlap with tissue-specific genes [(B), teal]. Genomic boundaries of the closest gene are marked at the bottom of each plot, and white arrowheads mark the direction of its transcription.
FIG. 4.
FIG. 4.
Characterization of the expression and mutation patterns of tissue-specific genes. (A and B) Box plots comparing the ratio of the number of indels to single-nucleotide variants (SNVs) (A) and the ratio of the number of long to short indels (B) between tissue-specific genes (orange) and other genes (purple). (C) Mutation rates of SNVs (black), short indels (purple), and long indels (orange) (y-axis, percentage of maximum) plotted against their genomic position around ALB (x-axis). (D and E) Box plots comparing the expression (D) and expression ratio in tumor versus normal tissue (E) of tissue-specific genes (orange) and other genes (purple). (F) Box plots comparing ALB expression (y-axis) between samples from tumor tissue (orange) and normal tissue (purple). (G and H) Box plots comparing heterogeneous expression of tissue-specific genes (orange) and other genes (purple) in single-cell data of hepatocytes (left) and endothelial cells (right) based on an analysis of variance (ANOVA) test (G) and the expression ratio between cell types (H). (I) Box plots comparing ALB expression in cells from different histological zones of the liver (x-axis). Boxes in (A) to (I) indicate the 25/75% interquartile range, vertical lines extend to 10/90% percentiles, and horizontal lines reflect distribution medians. Significant differences (Mann-Whitney U test) are marked with asterisks: *P < 0.05, **P < 0.01, ***P < 0.001.
FIG. 5.
FIG. 5.
Noncoding somatic mutations occur in regulatory regions around XBP1. (A) CRISPRi screening of regions around XBP1 using a library of 2923 sgRNAs in breast cancer cells (CAMA1). Regulatory regions were localized based on sgRNAs, for which KRAB-mediated silencing of their target region led to decreased XBP1 expression in flow cytometry (orange). (B) Fractions of effective sgRNAs (y-axis) plotted against their position around XBP1 (x-axis). Positions of ATAC-seq peaks (teal, bottom), noncoding mutations (purple, bottom), and target regions of the sgRNAs (top) are annotated. (C and D) Efficacies of sgRNAs (sliding window of 10 adjacent sgRNAs) compared between experimental replicates [x-axis versus y-axis (C)] and the ATAC-seq signal of their target regions in breast cancer [y-axis (D)]. (E) Bar graphs displaying the XBP1 expression ratio before and after CRISPRi in regulatory regions (orange) and nonregulatory regions (gray) for individual sgRNAs. Error bars reflect the SD across cells. (F) Mutation densities (purple), ATAC-seq signals (teal), and three-dimensional interactions in the breast cancer genome of MCF7 (ChIA-PET, black) plotted against their genomic position around XBP1 (x-axis). (G) XBP1 expression compared between breast tumors with [purple, mutated (mut)] and without [gray, wild-type (wt)] mutations around XBP1 in PCAWG (left) and CCLE (right). Boxes indicate the 25/75% interquartile range, vertical lines extend to 10/90% percentiles, and horizontal lines reflect distribution medians of XBP1 expression. Significant differences (Mann-Whitney U test) are annotated with asterisks: *P < 0.05, **P < 0.01, ***P < 0.001. (H) Gene Set Enrichment Analysis analyzing expression differences in tumors with high versus low XBP1 expression by computing an enrichment score (x-axis) and a significance value (y-axis) for each hallmark signature. For (I) and (J), gene ranks (x-axis) are plotted against enrichment scores (y-axis) for early (I) and late (J) estrogen response signatures (black).

References

    1. Stratton MR, Campbell PJ, Futreal PA, The cancer genome. Nature 458, 719–724 (2009). doi: 10.1038/nature07943; - DOI - PMC - PubMed
    1. Bailey MH et al., Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018). doi: 10.1016/j.cell.2018.07.034; - DOI - PMC - PubMed
    1. Lawrence MS et al., Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). doi: 10.1038/nature12213; - DOI - PMC - PubMed
    1. Martincorena I et al., Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017). doi: 10.1016/j.cell.2017.09.042; - DOI - PMC - PubMed
    1. Mularoni L, Sabarinathan R, Deu-Pons J, Gonzalez-Perez A, López-Bigas N, OncodriveFML: A general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016). doi: 10.1186/s13059-016-0994-0; - DOI - PMC - PubMed

Substances

LinkOut - more resources