Comparative Study

. 2020 Feb 20;21(1):43.

doi: 10.1186/s13059-020-01954-z.

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Hu Chen^{1

2}, Jun Li², Yumeng Wang², Patrick Kwok-Shing Ng³, Yiu Huen Tsang⁴, Kenna R Shaw³, Gordon B Mills⁴, Han Liang^{5

6}

Affiliations

¹ Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, 77030, USA.
² Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
³ Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
⁴ Department of Cell, Developmental & Cancer Biology, Knight Cancer Institute, Oregon Health Sciences University, Portland, OR, 97239, USA.
⁵ Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. hliang1@mdanderson.org.
⁶ Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. hliang1@mdanderson.org.

PMID: 32079540
PMCID: PMC7033911
DOI: 10.1186/s13059-020-01954-z

Comparative Study

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Hu Chen et al. Genome Biol. 2020.

. 2020 Feb 20;21(1):43.

doi: 10.1186/s13059-020-01954-z.

Authors

Hu Chen^{1

2}, Jun Li², Yumeng Wang², Patrick Kwok-Shing Ng³, Yiu Huen Tsang⁴, Kenna R Shaw³, Gordon B Mills⁴, Han Liang^{5

6}

Affiliations

¹ Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, 77030, USA.
² Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
³ Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
⁴ Department of Cell, Developmental & Cancer Biology, Knight Cancer Institute, Oregon Health Sciences University, Portland, OR, 97239, USA.
⁵ Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. hliang1@mdanderson.org.
⁶ Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. hliang1@mdanderson.org.

PMID: 32079540
PMCID: PMC7033911
DOI: 10.1186/s13059-020-01954-z

Abstract

Background: The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed.

Results: We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose.

Conclusions: Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.

Keywords: 3D clustering; Cell viability assay; Driver mutations; Passenger mutations; TP53 mutations; The Cancer Genome Atlas; Tumor transformation.

PubMed Disclaimer

Conflict of interest statement

G.B.M. is on the Scientific Advisory Board for AstraZeneca, ImmunoMet, Nuevolution, and Precision Medicine. H.L. is a shareholder and on the Scientific Advisory Board for Precision Scientific Ltd. The other authors declare that they have no competing interests.

Figures

**Fig. 1**
Feature summary and inter-correlations between algorithms. a Based on features included, each algorithm was labeled as using ensemble score, sequence context, protein feature, conservation, or epigenomic information. The algorithms trained on cancer diver data or proposed to identify cancer drivers are labeled as cancer-specific. b Left: hierarchical clustering pattern of 33 algorithms based on ~ 710,000 TCGA somatic mutations; right, a triangle heatmap displays the Spearman rank correlation coefficient between any two algorithms

**Fig. 2**
Assessment using a benchmark dataset based on mutation 3D clustering pattern. a Overview of the assessment process. We used four computational algorithms to detect whether mutations are located within the protein 3D structural hotspots, each algorithm with one vote. The number of votes was defined as the consensus cluster score. A mutation with a score of ≥ 2 and in a cancer gene (i.e., cancer gene consensus) was considered as a positive case, and a mutation with a score of 0 and in a non-cancer gene was considered as a negative case. b ROC curves and corresponding AUC scores for the top 10 algorithms. c Boxplots showing the differences of AUC between two groups of algorithms with or without certain features. p value is based on the Wilcoxon rank sum test. d Sensitivity and specificity of each algorithm calculated by using the median score value as the threshold to make binary predictions. Error bars, mean ± 2SD

**Fig. 3**
Assessment using a benchmark dataset based on OncoKB annotation. a Overview of the assessment process. The OncoKB database classifies mutations into four categories: oncogenic, likely oncogenic, likely neutral, and inconclusive. We considered “likely neutral” as negative cases, and we considered “oncogenic” mutations only or both “oncogenic” and “likely oncogenic” mutations as positive cases. b Bar plots showing the AUC scores of the 33 algorithms in the two comparisons. The red color is for oncogenic plus likely oncogenic vs. likely neutral, and green is for oncogenic vs. likely neutral. c Sensitivity and specificity of 33 algorithms. Error bars, mean ± 2SD

**Fig. 4**
Assessment using a benchmark dataset based on the transactivation effects of TP53 mutations. a Overview of the assessment process. Promoter-specific transcriptional activity was measured for 8 targets of p53 protein. Mutations with the median transcription activity ≤ 50% were used as positive cases, and others were used as negative cases. b ROC plot and AUC scores for the top 10 algorithms. c Sensitivity and specificity of 33 algorithms. Error bars, mean ± 2SD

**Fig. 5**
Assessment using a benchmark dataset based on in vivo tumor formation. a Overview of the assessment process. Cell lines stabling expressing mutant alleles were injected into mice. Mutations that could form any tumors greater than 500 mm³ by 130 days were considered as functional mutations and used as positives, and other mutations were used as negatives. b ROC plot and AUC scores for the top 10 algorithms. c Sensitivity and specificity of 33 algorithms. Error bars, mean ± 2SD

**Fig. 6**
Assessment using a benchmark dataset based on in vitro cell viability. a Overview of the assessment process. For each mutation, we performed cell viability assays in two “informer” cell lines, Ba/F3 and MCF10A. Consensus calls were inferred by integrating the functional effects observed in Ba/F3 and MCF10A. We considered activating, inactivating, inhibitory, and non-inhibitory mutations as positive cases, while neutral mutations were considered negative. b The ROC curves of the 33 algorithms based on a combined set of published mutations (Ng et al. [42]) and newly generated mutations in this study. c Bar plots showing the AUC scores of the 33 algorithms in the three datasets: new functional data (red), published functional data (green), and the combined set (blue). d Boxplots showing the differences of AUC between two groups of algorithms with or without certain features. p values are based on the Wilcoxon rank sum test. d Sensitivity and specificity of 33 algorithms. Error bars, mean ± 2SD

**Fig. 7**
Overall evaluation. a, b The overlapping summary of positive (a) and negative cases (b) in the five benchmark datasets. c Correlations of the performance ranks of the 33 algorithms based on the five benchmark datasets. d A heatmap showing the rank of the 33 algorithms based on each benchmark dataset. Ranks are labeled for the top five algorithms only. Red, higher ranks, and white, lower ranks. The features of the 33 algorithms are shown on the top, indicated by color (gray, no; and black, yes)

See this image and copyright information in PMC

References

1. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075. doi: 10.1038/nature07423. - DOI - PMC - PubMed
1. Cancer Genome Atlas Research Network. Weinstein JN, Collisson EA, Mills GB, KRM S, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. - DOI - PMC - PubMed
1. Hudson TJ, Anderson W, Aretz A, Barker AD, Bell C, Bernabé RR, et al. International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. - DOI - PMC - PubMed
1. Martincorena I, Campbell PJ. Somatic mutation in cancer and normal cells. Science. 2015;349:1483–1489. doi: 10.1126/science.aab4082. - DOI - PubMed
1. Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, et al. OncoKB: a precision oncology knowledge base. JCO Precis Oncol. 2017;2017:1–16. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Affiliations

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous