Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2016 Dec 13;113(50):14330-14335.
doi: 10.1073/pnas.1616440113. Epub 2016 Nov 22.

Evaluating the evaluation of cancer driver genes

Affiliations
Comparative Study

Evaluating the evaluation of cancer driver genes

Collin J Tokheim et al. Proc Natl Acad Sci U S A. .

Abstract

Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machine-learning-based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.

Keywords: DNA sequencing; cancer genomics; cancer mutations; computational method evaluation; driver genes.

PubMed Disclaimer

Conflict of interest statement

B.V. is a founder of PapGene and Personal Genome Diagnostics and a member of the Scientific Advisory Boards of Morphotek, Syxmex-Inostics, and Exelixis GP. The first four of these companies, as well as other companies, have licensed technologies from Johns Hopkins University, on which B.V. is an inventor. These licenses and relationships are associated with equity or royalty payments to B.V. The terms of these arrangements are being managed by Johns Hopkins University in accordance with its conflict of interest policies. K.W.K. is a founder of PapGene and Personal Genome Diagnostics and a member of the Scientific Advisory Boards of Morphotek and Syxmex-Inostics. These companies, as well as other companies, have licensed technologies from Johns Hopkins University, on which K.W.K. is an inventor. These licenses and relationships are associated with equity or royalty payments to K.W.K. The terms of these arrangements are being managed by Johns Hopkins University in accordance with its conflict of interest policies. N.P. is a founder of PapGene and Personal Genome Diagnostics. These companies, as well as other companies, have licensed technologies from Johns Hopkins University, on which N.P. is an inventor. These licenses and relationships are associated with equity or royalty payments to N.P. The terms of these arrangements are being managed by Johns Hopkins University in accordance with its conflict of interest policies.

Figures

Fig. S1.
Fig. S1.
Summary of evaluation dataset. The evaluation dataset consisted of mutations spanning 34 cancer types. All included mutations were small somatic variants. Cancer types are ordered from Left to Right by number of samples, ranging from 15 for soft-tissue sarcoma to 1,093 for BRAC, with an average of 232 samples per cancer type. These cancer types span a wide range of solid and several liquid cancers, including multiple tissues and cell types of origin, different background mutation rates, and different numbers of available samples. For each cancer type, total mutations and number of available samples are shown.
Fig. 1.
Fig. 1.
Outputs of eight driver prediction methods run through the evaluation protocol. (A) Fraction of predicted driver genes (q ≤ 0.1) that are found in the Cancer Gene Census (CGC) (downloaded April 1, 2016). Raw count of predicted driver genes indicated on Top of each bar. (B) Divergence from uniform P values, measured as mean log fold change (MLFC) between a method's observed and desired theoretical P values. (C) Number of predicted driver genes. Driver gene is defined as having Benjamini–Hochberg adjusted P value q ≤ 0.1. (D) Consistency of each method measured by TopDrop consistency (TDC) at depth of 100 in the method's ranked list of genes. Error bars indicate ±1 SEM across 10 repeated splits of the data.
Fig. S2.
Fig. S2.
Fraction of predicted driver genes for each method in functionally validated subset of Cancer Gene Census (CGC). The functionally validated subset consists of 99 genes identified by Kumar et al. (15). A predicted driver gene is defined by Benjamini–Hochberg adjusted P value (q ≤ 0.1). Although fractions for all methods are lower than when the full CGC is used, comparison with Fig. 1A shows that the relative ordering of methods is very similar. 20/20+, MutsigCV, and TUSON have substantially higher fractions than the other methods, regardless of whether the functionally validated subset or the full CGC is considered. Note that functional studies can provide additional evidence to support the conjecture that a gene is a driver gene, but is by no means definitive. At present, the only reliable way to identify driver genes in human tumors is through genetic data (6).
Fig. S3.
Fig. S3.
Fraction of predicted driver genes for each method by consensus among methods. Fraction of predicted drivers unique to each method, predicted by two to three methods or predicted by more than three methods are shown. A predicted driver gene is defined by Benjamini–Hochberg adjusted P value (q ≤ 0.1).
Fig. S4.
Fig. S4.
Quantile–quantile plots comparing observed and theoretical P values for the tested methods. (A) Full P value range from 0 to 1. (B) Blowup of P values from 0 to 0.1. Observed P values for the methods (blue) are compared with those expected from a uniform distribution (red). Genes predicted as drivers by at least three methods were removed along with genes in the CGC. TUSON OG and TSG P values are shown separately.
Fig. S5.
Fig. S5.
TDC of pancancer driver gene predictions as depth threshold is varied. The consistency of each evaluated method is shown as depth threshold varies from 20 to 300. Error bars indicate ±1 SEM across 10 repeated splits of the data.
Fig. S6.
Fig. S6.
Flowchart of evaluation protocol. Overview of how a driver gene prediction method of interest can be evaluated. The input to the method is the pancan somatic mutation set provided in this work (karchinlab.org/data/Protocol/pancan-mutation-set-from-Tokheim-2016.txt.gz). The initial output from the method to be evaluated is a list of predicted driver genes with associated P values and q values. A list of significant driver genes is produced by selecting a q value threshold. To compute fraction overlap of genes predicted as significant with Cancer Gene Census (CGC) and with the eight methods evaluated here, a freeze of CGC (karchinlab.org/data/Protocol/CGC-freeze-download-date-20160401.tsv) and predictions from the eight methods (Dataset S4) are provided. These gene lists are also used to subtract out putative driver genes and yield a list of filtered P values. Method consistency is estimated by 10 iterations of splitting the pancan somatic mutation set, outputting gene P values and scores for both halves, and applying the TopDrop metric. Jupyter notebooks for computing MLFC and q–q plots from the filtered P value list, and the average TDC score are available at https://github.com/KarchinLab/Tokheim_PNAS_2016.
Fig. S7.
Fig. S7.
Evaluation of the eight methods on four different cancer types. Methods were evaluated for (A) mean log fold change (MLFC), (B) number of drivers predicted (q ≤ 0.1), and (C) TDC 10 (TDC at a gene rank depth of 10). 20/20+ and OncodriveFML have the lowest MLFC (least divergence between observed and theoretical P values). MuSiC, 20/20+, and TUSON have the highest TDC 10 (consistency in gene rankings across matched random partitions of each tumor type). The four cancer types: pancreatic carcinoma (PDAC), breast adenocarcinoma (BRAC), head and neck squamous carcinoma (HNSCC), and lung adenocarcinoma (LUAD), have background somatic mutation rates ranging from moderate to high.
Fig. S8.
Fig. S8.
Background mutation rate is more variable than the ratio of nonsilent to silent mutations across the 34 cancer types. Boxplots are plotted on a log10 scale. The top boxplot shows the mutation rate in coding sequence for the samples in our pancancer dataset. The bottom boxplot shows the ratio of nonsilent to silent mutations in coding sequence for the same samples. A pseudocount for a silent mutation was added for each sample to avoid dividing by zero. Notches indicate bootstrap 95% confidence interval (1,000 iterations) for the median. Outliers, defined as 1.5*IQR away from the first and third quartile, are not shown.
Fig. 2.
Fig. 2.
Models of mutation rate-based and ratiometric-based methods suggest decrease in false positives and increased power with ratiometric approach. (A) Expected false positives for a mutation rate-based predictor that identifies genes with increased mutation rate over background. (B) Expected false positives for a ratiometric predictor that identifies genes with increased inactivating mutation fraction over background. For both A and B, we assume there is unexplained variability in either background mutation rate or inactivating mutation fraction that is not accounted for in driver gene prediction. False positives are shown as a function of sample size (up to 8,000 paired tumor–normal samples) for low (0.5 mutations per MB), medium (3.0 mutations per MB), and high (10.0 mutations per MB) background mutation rates and low (blue), medium (green), and high (red) unexplained variability (CVs of 0.05, 0.1, and 0.2, respectively). The dashed line indicates one expected false positive. For the mutation rate-based method, the number of false positives increases to undesirable levels for high mutation rates, particularly when there is high unexplained variability. (C) Sample size required for near-comprehensive detection of intermediate-effect driver genes (90% detection and 2% effect size/increase with respect to background). Results are shown for scenarios with no unexplained variability (black), low (blue), medium (green), and high (red) unexplained variability (CVs of 0.0, 0.05, 0.1, and 0.2, respectively). The number of required samples for the mutation rate-based method becomes very large for moderate-to-high mutation rates and levels of unexplained variability, but it is considerably lower for the ratiometric method. MB, megabase. The jagged behavior of the curve in A and B is due to the discrete nature of our data.
Fig. S9.
Fig. S9.
Decision tree underlying 20/20 rule. Each gene is input into the tree and oncogene (OG) and tumor suppressor gene (TSG) score computed. Thresholds of each score and the numerator of the OG score (recurrence count) and TSG score (inactivating count) are used to determine whether a gene is an OG, TSG, or passenger.
Fig. S10.
Fig. S10.
Random Forest feature importance ranking for the 24 predictive features. The mean decrease in Gini index is plotted for each feature. Error bars indicate SD when feature importance calculation was repeated on 10 different cross-validation partitions. CCLE, Cancer Cell Line Encyclopedia (4); HiC, 3D chromatin interaction capture (4); MGAEntropy, Shannon entropy in column of a vertebrate genome 46-way alignment corresponding to location of the mutation (30); SNV, single-nucleotide variant; VEST, Variant Effect Scoring Tool.

References

    1. Watson IR, Takahashi K, Futreal PA, Chin L. Emerging patterns of somatic mutations in cancer. Nat Rev Genet. 2013;14(10):703–718. - PMC - PubMed
    1. Sjöblom T, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314(5797):268–274. - PubMed
    1. Parmigiani G, et al. Design and analysis issues in genome-wide somatic mutation studies of cancer. Genomics. 2009;93(1):17–21. - PMC - PubMed
    1. Lawrence MS, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214–218. - PMC - PubMed
    1. Dees ND, et al. MuSiC: Identifying mutational significance in cancer genomes. Genome Res. 2012;22(8):1589–1598. - PMC - PubMed

Publication types