Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 24;9(1):9-23.e8.
doi: 10.1016/j.cels.2019.05.005. Epub 2019 Jun 12.

CHASMplus Reveals the Scope of Somatic Missense Mutations Driving Human Cancers

Affiliations

CHASMplus Reveals the Scope of Somatic Missense Mutations Driving Human Cancers

Collin Tokheim et al. Cell Syst. .

Abstract

Large-scale cancer sequencing studies of patient cohorts have statistically implicated many genes driving cancer growth and progression, and their identification has yielded substantial translational impact. However, a remaining challenge is to increase the resolution of driver prediction from the level of genes to mutations because mutation-level predictions are more closely aligned with the goal of precision cancer medicine. Here, we present CHASMplus, a computational method that is uniquely capable of identifying driver missense mutations, including those specific to a cancer type, as evidenced by significantly superior performance on diverse benchmarks. Applied to 8,657 tumor samples across 32 cancer types in The Cancer Genome Atlas (TCGA), CHASMplus identifies over 4,000 unique driver missense mutations in 240 genes, supporting a prominent role for rare driver mutations. We show which TCGA cancer types are likely to yield discovery of new driver missense mutations by additional sequencing, which has important implications for public policy.

Keywords: TCGA; cancer driver; cancer-type-specific driver; missense mutation; rare drivers.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Overview of CHASMplus. See also Figure S1 and Table S1.
a) CHASMplus, a machine learning approach, was applied to somatic missense mutations in tumors from 32 different cancer types found in The Cancer Genome Atlas (TCGA). High scores predicted by CHASMplus indicate stronger evidence for a mutation being a cancer driver. b) CHASMplus has a model for each cancer type in the TCGA to identify driver missense mutations specific to a cancer type. Putative driver missense mutations are identified at a False Discovery Rate (FDR) threshold of 1%. In the example, the mutation was statistically significant for the BRCA model but not for the UVM model. Abbreviations: Breast Invasive Carcinoma (BRCA), Uveal Melanoma (UVM) and cumulative distribution function (CDF).
Figure 2.
Figure 2.. Assessment of pan-cancer and cancer type-specific predictions. See also Figure S2 and Table S2.
a) Receiver Operating Characteristic (ROC) curve for identifying activating mutations in MCF10A cells, a breast epithelium line, with breast cancer-specific models. b) ROC curve for discerning literature annotated oncogenic mutations from OncoKB for specific cancer types: Breast Invasive Ductal Carcinoma (BRCA), Glioblastoma Multiforme (GBM), High-Grade Serous Ovarian Cancer (OV), and Colon Adenocarcinoma (COAD). c) Score distribution for cancer type-specific models (labeled by title) for TCGA missense mutations in EGFR, either found in lung adenocarcinoma (LUAD, typically kinase domain mutation) or glioblastoma multiforme (GBM, typically extracellular domain mutations). d) Overview of 5 pan-cancer benchmarks by scale of evaluation and type of supportive evidence. e) Pan-cancer performance measured by the area under the Receiver Operating Characteristic Curve (auROC) on the 5 mutation-level benchmarks (shown in text). The color scale from red to blue indicates methods ranked from high to low performance. Benchmarks are categorized by in vitro (green), in vivo (yellow), and literature-based benchmarks (turquoise).
Figure 3.
Figure 3.. Frequency spectrum of driver missense mutations. See also Figures S3–S4 and Tables S3–S5.
a) Proportion of the overall frequency of driver missense mutations found to be rare (<1% of samples or singleton mutations), intermediate (1–5%), and common (>5%). Correspondingly shown as light to dark blue. b) Correlation between tumor mutation burden and overall driver prevalence (number of driver mutations per tumor) across the frequency spectrum of drivers (rare, intermediate, and common). Shaded area indicates the 95% confidence interval. c) Analysis of genes containing predicted driver missense mutations that preferentially occurred in a subtype-specific manner (Chi-squared Test, q<0.1). The pie chart illustrates the percentage of genes for all cancer types with a significant subtype-specific pattern, while the heatmap illustrates significant genes for Breast Invasive Carcinoma. d) Structure of the Phosphatase 2A holoenzyme (PDB 2IAE). e) Structures of the ERBB2 extracellular domain (left, PDB 2A91) and kinase domain (right, PDB 3PP0). f) Lollipop plot of driver missense mutations identified by CHASMplus (yellow), and likely truncating variants (frameshift insertion or deletion: purple, nonsense mutation: red, and splice site mutation: orange) in CASP8 for Head & Neck squamous cell carcinoma (HNSC). TCGA acronyms for cancer types are listed in methods.
Figure 4.
Figure 4.. Inferred immune cell content and gene expression correlate with presence of predicted driver missense mutations in CASP8. See also Figure S5.
Twelve immune-related biomarkers are shown, as estimated in Head and Neck Squamous Cell Carcinoma (HNSC) tumor samples from TCGA (Thorsson et al., 2018). Each panel compares the distribution of a marker in samples harboring a rare driver missense mutation (orange), a truncating mutation (purple) and control samples with no CASP8 mutations (green). Both tumor samples with rare driver missense mutations and truncating mutations showed a similar significantly elevated inferred immune cell infiltration. Top row, inferred immune infiltrates from DNA methylation or gene expression from TCGA HNSC samples (Thorsson et al., 2018). Bottom row, gene expression values from RNA-Seq for several important immune-related genes reported in (Thorsson et al., 2018). “mis” indicates samples with driver missense mutations identified by CHASMplus, and “lof” is likely loss-of-function variants (nonsense, frameshift insertion/deletions, splice site, translation start site, and nonstop mutations). Mann-Whitney U test, adjusted p-value (Benjamini-Hochberg method): *<0.05, **<0.01, and ***<0.001.
Figure 5.
Figure 5.. CHASMplus predictions correlate with multiplexed functional assays in PTEN. See also Table S6.
a) Heatmap displaying gene-weighted CHASMplus scores (gwCHASMplus) of PTEN missense mutations. b) Scores are negatively correlated with PTEN lipid phosphatase activity (spearman rho=−0.520) (Mighell et al., 2018) and c) PTEN protein abundance (spearman rho=−0.428) (Matreyek et al., 2018). d) Both gwCHASMplus correlations had a larger absolute value than the correlation between the two experiments (spearman rho=0.351). ***=p<0.001. Distribution of e) PTEN lipid phosphatase activity or f) protein abundance in predicted driver missense mutations from TCGA (common: >5% of tumor samples, intermediate: 1–5%, and rare: <1%), all other missense mutations and truncating mutations. g) Comparison of CHASMplus to the 2nd and 3rd ranked methods in Figure 2e. Left, specificity of methods at identifying PTEN missense mutations that do not lower lipid phosphatase activity. Right, sensitivity (recall), precision, and F1 score for identifying missense mutations that lower lipid phosphatase activity. CHASMplus had the highest specificity, precision and F1 score.
Figure 6.
Figure 6.. Characteristics and trajectory of missense mutation driver discovery. See also Figure S7 and Table S7.
a) Plot displaying normalized driver diversity and driver prevalence (fraction of tumor samples mutated) for driver missense mutations in 32 cancer types. K-means clustering identified 5 clusters with centroids shown as numerically designated circles. b) Prevalence of driver missense mutations identified by CHASMplus as a function of sample size. Lines represent LOWESS fit to different rarities of driver missense mutations. TCGA acronyms for cancer types are listed in the Methods.
Figure 7.
Figure 7.. Hotspot detection alone has limited statistical power to identify driver mutations.
a) Statistical power to detect a significantly elevated number of non-silent mutations in an individual codon, as a function of sample size and mutation rate. Circles represent each cancer type from the TCGA and are placed by sample size and median mutation rate. Curves are colored by the frequency of driver mutations (fraction of non-silent mutated cancer samples above background). If a circle is below a curve, then hotspot detection is not yet sufficiently powerful to detect driver mutations of that frequency. b) Bar graph comparing power (sensitivity) to detect labeled oncogenic driver missense mutations from OncoKB between CHASMplus and the cancer hotspots method (Chang et al., 2016). Stratification by TP53 suggests that the increased power provided by CHASMplus is not solely a result of high performance on oncogenic TP53 mutations.

References

    1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, and Sunyaev SR (2010). A method and server for predicting damaging missense mutations. Nat Methods 7, 248–249. - PMC - PubMed
    1. Altmann A, Tolosi L, Sander O, and Lengauer T (2010). Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340–1347. - PubMed
    1. Amit Y, and Geman D (1997). Shape Quantization and Recognition with Randomized Trees. Neural Computation 9, 1545–1588.
    1. Armenia J, Wankowicz SA, Liu D, Gao J, Kundra R, Reznik E, Chatila WK, Chakravarty D, Han GC, and Coleman I (2018). The long tail of oncogenic drivers in prostate cancer. Nature genetics, 1. - PMC - PubMed
    1. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B, et al. (2018). Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371–385 e318. - PMC - PubMed

Publication types

LinkOut - more resources