Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2020 Feb 20;21(1):43.
doi: 10.1186/s13059-020-01954-z.

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Affiliations
Comparative Study

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Hu Chen et al. Genome Biol. .

Abstract

Background: The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed.

Results: We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose.

Conclusions: Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.

Keywords: 3D clustering; Cell viability assay; Driver mutations; Passenger mutations; TP53 mutations; The Cancer Genome Atlas; Tumor transformation.

PubMed Disclaimer

Conflict of interest statement

G.B.M. is on the Scientific Advisory Board for AstraZeneca, ImmunoMet, Nuevolution, and Precision Medicine. H.L. is a shareholder and on the Scientific Advisory Board for Precision Scientific Ltd. The other authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Feature summary and inter-correlations between algorithms. a Based on features included, each algorithm was labeled as using ensemble score, sequence context, protein feature, conservation, or epigenomic information. The algorithms trained on cancer diver data or proposed to identify cancer drivers are labeled as cancer-specific. b Left: hierarchical clustering pattern of 33 algorithms based on ~ 710,000 TCGA somatic mutations; right, a triangle heatmap displays the Spearman rank correlation coefficient between any two algorithms
Fig. 2
Fig. 2
Assessment using a benchmark dataset based on mutation 3D clustering pattern. a Overview of the assessment process. We used four computational algorithms to detect whether mutations are located within the protein 3D structural hotspots, each algorithm with one vote. The number of votes was defined as the consensus cluster score. A mutation with a score of ≥ 2 and in a cancer gene (i.e., cancer gene consensus) was considered as a positive case, and a mutation with a score of 0 and in a non-cancer gene was considered as a negative case. b ROC curves and corresponding AUC scores for the top 10 algorithms. c Boxplots showing the differences of AUC between two groups of algorithms with or without certain features. p value is based on the Wilcoxon rank sum test. d Sensitivity and specificity of each algorithm calculated by using the median score value as the threshold to make binary predictions. Error bars, mean ± 2SD
Fig. 3
Fig. 3
Assessment using a benchmark dataset based on OncoKB annotation. a Overview of the assessment process. The OncoKB database classifies mutations into four categories: oncogenic, likely oncogenic, likely neutral, and inconclusive. We considered “likely neutral” as negative cases, and we considered “oncogenic” mutations only or both “oncogenic” and “likely oncogenic” mutations as positive cases. b Bar plots showing the AUC scores of the 33 algorithms in the two comparisons. The red color is for oncogenic plus likely oncogenic vs. likely neutral, and green is for oncogenic vs. likely neutral. c Sensitivity and specificity of 33 algorithms. Error bars, mean ± 2SD
Fig. 4
Fig. 4
Assessment using a benchmark dataset based on the transactivation effects of TP53 mutations. a Overview of the assessment process. Promoter-specific transcriptional activity was measured for 8 targets of p53 protein. Mutations with the median transcription activity ≤ 50% were used as positive cases, and others were used as negative cases. b ROC plot and AUC scores for the top 10 algorithms. c Sensitivity and specificity of 33 algorithms. Error bars, mean ± 2SD
Fig. 5
Fig. 5
Assessment using a benchmark dataset based on in vivo tumor formation. a Overview of the assessment process. Cell lines stabling expressing mutant alleles were injected into mice. Mutations that could form any tumors greater than 500 mm3 by 130 days were considered as functional mutations and used as positives, and other mutations were used as negatives. b ROC plot and AUC scores for the top 10 algorithms. c Sensitivity and specificity of 33 algorithms. Error bars, mean ± 2SD
Fig. 6
Fig. 6
Assessment using a benchmark dataset based on in vitro cell viability. a Overview of the assessment process. For each mutation, we performed cell viability assays in two “informer” cell lines, Ba/F3 and MCF10A. Consensus calls were inferred by integrating the functional effects observed in Ba/F3 and MCF10A. We considered activating, inactivating, inhibitory, and non-inhibitory mutations as positive cases, while neutral mutations were considered negative. b The ROC curves of the 33 algorithms based on a combined set of published mutations (Ng et al. [42]) and newly generated mutations in this study. c Bar plots showing the AUC scores of the 33 algorithms in the three datasets: new functional data (red), published functional data (green), and the combined set (blue). d Boxplots showing the differences of AUC between two groups of algorithms with or without certain features. p values are based on the Wilcoxon rank sum test. d Sensitivity and specificity of 33 algorithms. Error bars, mean ± 2SD
Fig. 7
Fig. 7
Overall evaluation. a, b The overlapping summary of positive (a) and negative cases (b) in the five benchmark datasets. c Correlations of the performance ranks of the 33 algorithms based on the five benchmark datasets. d A heatmap showing the rank of the 33 algorithms based on each benchmark dataset. Ranks are labeled for the top five algorithms only. Red, higher ranks, and white, lower ranks. The features of the 33 algorithms are shown on the top, indicated by color (gray, no; and black, yes)

References

    1. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075. doi: 10.1038/nature07423. - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research Network. Weinstein JN, Collisson EA, Mills GB, KRM S, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. - DOI - PMC - PubMed
    1. Hudson TJ, Anderson W, Aretz A, Barker AD, Bell C, Bernabé RR, et al. International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. - DOI - PMC - PubMed
    1. Martincorena I, Campbell PJ. Somatic mutation in cancer and normal cells. Science. 2015;349:1483–1489. doi: 10.1126/science.aab4082. - DOI - PubMed
    1. Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, et al. OncoKB: a precision oncology knowledge base. JCO Precis Oncol. 2017;2017:1–16. - PMC - PubMed

Publication types