Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 15;83(4):500-505.
doi: 10.1158/0008-5472.CAN-22-1508.

Estimation of Neutral Mutation Rates and Quantification of Somatic Variant Selection Using cancereffectsizeR

Affiliations

Estimation of Neutral Mutation Rates and Quantification of Somatic Variant Selection Using cancereffectsizeR

Jeffrey D Mandell et al. Cancer Res. .

Abstract

Somatic nucleotide mutations can contribute to cancer cell survival, proliferation, and pathogenesis. Although research has focused on identifying which mutations are "drivers" versus "passengers," quantifying the proliferative effects of specific variants within clinically relevant contexts could reveal novel aspects of cancer biology. To enable researchers to estimate these cancer effects, we developed cancereffectsizeR, an R package that organizes somatic variant data, facilitates mutational signature analysis, calculates site-specific mutation rates, and tests models of selection. Built-in models support effect estimation from single nucleotides to genes. Users can also estimate epistatic effects between paired sets of variants, or design and test custom models. The utility of cancer effect was validated by showing in a pan-cancer dataset that somatic variants classified as likely pathogenic or pathogenic in ClinVar exhibit substantially higher effects than most other variants. Indeed, cancer effect was a better predictor of pathogenic status than variant prevalence or functional impact scores. In addition, the application of this approach toward pairwise epistasis in lung adenocarcinoma showed that driver mutations in BRAF, EGFR, or KRAS typically reduce selection for alterations in the other two genes. Companion reference data packages support analyses using the hg19 or hg38 human genome builds, and a reference data builder enables use with any species or custom genome build with available genomic and transcriptomic data. A reference manual, tutorial, and public source code repository are available at https://townsend-lab-yale.github.io/cancereffectsizeR. Comprehensive estimation of cancer effects of somatic mutations can provide insights into oncogenic trajectories, with implications for cancer prognosis and treatment.

Significance: An R package provides streamlined, customizable estimation of underlying nucleotide mutation rates and of the oncogenic and epistatic effects of mutations in cancer cohorts.

PubMed Disclaimer

Figures

Figure 1. The cancereffectsizeR workflow, spanning assembly of diverse variant datasets to quantification of effect sizes.
Figure 1.
The cancereffectsizeR workflow, spanning assembly of diverse variant datasets to quantification of effect sizes.
Figure 2. Selection inferences from a standard cancereffectsizeR workflow (version 2.6.4) with somatic variant data from exome and panel sequencing of lung adenocarcinoma. A, Highest-effect recurrent somatic variants (and 95% confidence intervals) under the default model of selection at individual genomic sites. B, Ratios of selection coefficients for the observed non-synonymous and splice-site mutations in gene one after mutation of gene two relative to selection coefficients of gene one when other genes analyzed are unmutated (tan bars), and ratios of selection coefficients for the observed non-synonymous and splice-site mutations in gene two after mutation of gene one relative to selection coefficients of gene two when other genes analyzed are unmutated (green bars). For some gene pairs, the epistatic model is not significantly better than a model that assumes no epistatic effects (P > 0.05, likelihood ratio test; transparent bars). Asterisks denote genes within pairs that not only are inferred to be subject to selective pairwise epistasis, but that also exhibit specific statistically significant directional changes in selection after mutation in the other gene (**, P < 0.01; ***, P <0.001; likelihood ratio test).
Figure 2.
Selection inferences from a standard cancereffectsizeR workflow (version 2.6.5) with somatic variant data from exome and panel sequencing of lung adenocarcinoma. A, Highest effect recurrent somatic variants (and 95% confidence intervals) under the default model of selection at individual genomic sites. B, Ratios of selection coefficients for the observed nonsynonymous and splice-site mutations in gene one after mutation of gene two relative to selection coefficients of gene one when other genes analyzed are unmutated (tan bars), and ratios of selection coefficients for the observed nonsynonymous and splice-site mutations in gene two after mutation of gene one relative to selection coefficients of gene two when other genes analyzed are unmutated (green bars). For some gene pairs, the epistatic model is not significantly better than a model that assumes no epistatic effects (P > 0.05, likelihood ratio test; transparent bars). Asterisks denote genes within pairs that not only are inferred to be subject to selective pairwise epistasis, but that also exhibit specific statistically significant directional changes in selection after mutation in the other gene. **, P < 0.01; ***, P <0.001; likelihood ratio test.
Figure 3. Boxplots of the cancer effects of variants appearing in two or more patients across eight TCGA cohorts. A set of merged somatic variants that are annotated within ClinVar as likely pathogenic or pathogenic are compared with other variants, and sites mutated recurrently within a cancer type are compared with sites hit only once within a cancer type. Each cancer effect estimate is a cancer-type–specific inference; variants appearing in multiple cohorts are reported by multiple estimates. All pairwise comparisons of the four groups—including comparisons without brackets, indicating statistical significance—yielded statistically significant differences (Mann–Whitney U test, P < 10–16 for all).
Figure 3.
Boxplots of the cancer effects of variants appearing in two or more patients across eight TCGA cohorts. A set of merged somatic variants that are annotated within ClinVar as likely pathogenic or pathogenic is compared with other variants, and sites mutated recurrently within a cancer type are compared with sites hit only once within a cancer type. Each cancer effect estimate is a cancer type–specific inference; variants appearing in multiple cohorts are reported by multiple estimates. Two statistically significant pairwise comparisons are shown, but all possible pairwise comparisons of groups yielded statistically significant differences (Mann–Whitney U test, P < 10–16 for all). ***, P < 0.001.
Figure 4. Contingency table (confusion matrix) and model summary of a multiple logistic regression predicting merged pathogenic or likely pathogenic ClinVar status of variants based on mean cancer effect across eight cancer types, top cancer effect across eight cancer types, SIFT score, PolyPhen-2 score, mean prevalence across eight cancer types, and top prevalence across eight cancer types. Noncoding variants—which lacked SIFT and PolyPhen-2 scores—were excluded from the regression. Cancer effect measures were log-transformed, and all predictive parameters were standardized. Predictor importance was determined from bootstrapped dominance analysis (100 bootstrap runs); each predictor exhibited pairwise general dominance over all less-important predictors.
Figure 4.
Contingency table (confusion matrix) and model summary of a multiple logistic regression predicting merged pathogenic or likely pathogenic ClinVar status of variants based on mean cancer effect across eight cancer types, top cancer effect across eight cancer types, SIFT score, PolyPhen-2 score, mean prevalence across eight cancer types, and top prevalence across eight cancer types. Noncoding variants—which lacked SIFT and PolyPhen-2 scores—were excluded from the regression. Cancer effect measures were log transformed, and all predictive parameters were standardized. Predictor importance was determined from bootstrapped dominance analysis (100 bootstrap runs); each predictor exhibited pairwise general dominance over all less important predictors.

References

    1. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. . COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res 2019;47:D941–7. - PMC - PubMed
    1. Prawira A, Pugh TJ, Stockley TL, Siu LL. Data resources for the identification and interpretation of actionable mutations by clinicians. Ann Oncol 2017;28:946–57. - PubMed
    1. Cannataro VL, Townsend JP. Neutral theory and the somatic evolution of cancer. Mol Biol Evol 2018;35:1308–15. - PMC - PubMed
    1. Starrett JH, Guernet AA, Cuomo ME, Poels KE, van Alderwerelt van Rosenburgh IK, Nagelberg A, et al. . Drug sensitivity and allele specificity of first-line osimertinib resistance mutations. Cancer Res 2020;80:2017–30. - PMC - PubMed
    1. Schuh A, Becq J, Humphray S, Alexa A, Burns A, Clifford R, et al. . Monitoring chronic lymphocytic leukemia progression by whole-genome sequencing reveals heterogeneous clonal evolution patterns. Blood 2012;120:4191–6. - PubMed

Publication types