Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 17;13(1):7023.
doi: 10.1038/s41467-022-34746-z.

Somatic mutation distribution across tumour cohorts provides a signal for positive selection in cancer

Affiliations

Somatic mutation distribution across tumour cohorts provides a signal for positive selection in cancer

Martin Boström et al. Nat Commun. .

Abstract

Cancer gene discovery is reliant on distinguishing driver mutations from a multitude of passenger mutations in tumour genomes. While driver genes may be revealed based on excess mutation recurrence or clustering, there is a need for orthogonal principles. Here, we take advantage of the fact that non-cancer genes, containing only passenger mutations under neutral selection, exhibit a likelihood of mutagenesis in a given tumour determined by the tumour's mutational signature and burden. This relationship can be disrupted by positive selection, leading to a difference in the distribution of mutated cases across a cohort for driver and passenger genes. We apply this principle to detect cancer drivers independently of recurrence in large pan-cancer cohorts, and show that our method (SEISMIC) performs comparably to traditional approaches and can provide resistance to known confounding mutational phenomena. Being based on a different principle, the approach provides a much-needed complement to existing methods for detecting signals of selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Detection of positive selection based on skewed mutation distribution across cohorts.
In the SEISMIC method, the probability of mutagenesis for a given gene and tumour in a cohort is estimated based on trinucleotide signatures (per-sample or per-cohort) and tumour-specific burdens (left side). Genes under neutral selection are assumed to exhibit patterns of mutagenesis in agreement with these probabilities across the cohort (blue squares), whereas genes under selective pressure for mutations are expected to show deviating patterns (red squares). Significance is assessed by comparing the observed outcome for a given gene (pattern of mutated tumours across the cohort) to multiple simulated outcomes based on the estimated neutral probabilities, and differences are visualised by plotting the cumulative number of mutated tumours across the cohort ordered by gene mutation probability (CMT plot; right side). By default, only missense and nonsense mutations are considered, as synonymous mutations are assumed to generally be selectively neutral. CDS coding sequence.
Fig. 2
Fig. 2. Cancer genes identified by SEISMIC in a skin melanoma cohort.
a Cumulative mutated tumours plot (CMT plot, see Fig. 1) for six significant cancer genes identified by SEISMIC at a false discovery rate (q) <0.05 (see “Methods”), plus non-cancer gene examples, in melanoma WXS data (CDS regions only; n = 466 tumours). The blue area represents the least extreme 90% of simulation outcomes. Inset: Gamma distribution fit of the likelihoods of the simulated cohorts in blue vs. that of the actual cohort, with q-values indicated. b Heatmap of non-synonymous mutations in tumours ordered by mutation burden (here equivalent to gene mutation probability) and binned, for the genes from (a). In non-cancer genes, the number of mutations in each bin increases with mutation burden, but this correlation is disrupted in the cancer genes. c Same as in (b) but repeated using melanoma WGS data, with CDS and intron-based results shown separately. d Positionally clustered known driver mutations in GNAQ, KIT and SF3B1 occurring preferentially in non-UV-exposed samples. Tumours are sorted by burden along the y-axis, with the proportion UV-type mutations (C > T in dipyrimidine contexts) indicated. Known driver mutations in Cancer Mutations Census (CMC) are indicated, with Tier 1 having the strongest evidence. CDS coding sequence, diPy dipyrimidine, WGS whole-genome sequencing, WXS whole-exome sequencing. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Resilience to confounding effects from ETS-related local hotspots in melanoma promoters.
a Significance (uncorrected p-value) vs. mutational recurrence for promoters (500 bp upstream regions) in melanoma (n = 221 tumours), for the SEISMIC method (based on cohort mutational skew; top) and ActiveDriverWGS (frequency-based; bottom). The proportion of mutations within 10 bp of a TTCCG sequence is indicated for each promoter, in order to pinpoint confounding recurrent mutations due to increased UV damage susceptibility at ETS transcription factors binding sites. Promoters were binned to avoid overplotting. Both tests correctly identify TERT promoter mutations as drivers, but SEISMIC also lacked enrichment of TTCCG-related hotspot mutations among top hits. b CMT plots (see Fig. 1) for the RPL13A and TERT promoters, with observed and expected recurrence on the right side. Unlike TERT, mutated cases with respect to the RPL13A promoter fall closely within the simulated expected distribution across the cohort, thus avoiding significance despite strong excess burden. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Driver genes identified by low-burden skew.
a Genes exhibiting significant cohort mutational skew in 14 cancer types as well as pan-cancer (LIHC and CESC lacked significant results). Results are ordered by recurrence, with coloured dots on the right side indicating level of significance, and on the left side indicating support in other driver studies in the same/different cancer type–,. Canonical cancer genes (from the Cancer Gene Census, CGC) marked with black text. Genes with ≥3 mutations were considered in each cancer. b Cumulative plot showing enrichment of canonical cancer genes among the most significant genes using our method, MutPanning, dNdScv, and MutSigCV, on the same UCEC dataset. c Results from (b) shown as a Venn diagram of overlapping significant genes (q < 0.05, false discovery rate). The number of CGC genes is indicated in each set. BLCA bladder carcinoma, BRCA breast carcinoma, CESC cervical carcinoma, CRC colorectal carcinoma, ESCA oesophageal carcinoma, GBM glioblastoma, HNSC head and neck carcinoma, LIHC liver hepatocellular carcinoma, LUAD lung adenocarcinoma, LUSC lung squamous cell carcinoma, OV ovarian adenocarcinoma, SKCM cutaneous melanoma, STAD stomach adenocarcinoma, UCEC endometrial carcinoma. Source data are provided as a Source Data file.

References

    1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458:719–724. - PMC - PubMed
    1. Vogelstein B, et al. Cancer genome landscapes. Science. 2013;339:1546–1558. - PMC - PubMed
    1. Mularoni L, Sabarinathan R, Deu-Pons J, Gonzalez-Perez A, López-Bigas N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 2016;17:128–128. - PMC - PubMed
    1. Lawrence MS, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. - PMC - PubMed
    1. Martincorena I, et al. Universal patterns of selection in cancer and somatic tissues. Cell. 2017;171:1029–1041.e1021. - PMC - PubMed

Publication types