CancerVar: An artificial intelligence-empowered platform for clinical interpretation of somatic mutations in cancer

Quan Li^{1

2}, Zilin Ren², Kajia Cao³, Marilyn M Li^{3

4}, Kai Wang^{2

4}, Yunyun Zhou²

Affiliations

¹ Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON M5G2C1, Canada.
² Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
³ Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
⁴ Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

PMID: 35544644
PMCID: PMC9075800
DOI: 10.1126/sciadv.abj1624

CancerVar: An artificial intelligence-empowered platform for clinical interpretation of somatic mutations in cancer

Quan Li et al. Sci Adv. 2022.

. 2022 May 6;8(18):eabj1624.

doi: 10.1126/sciadv.abj1624. Epub 2022 May 6.

Authors

Quan Li^{1

2}, Zilin Ren², Kajia Cao³, Marilyn M Li^{3

4}, Kai Wang^{2

4}, Yunyun Zhou²

Affiliations

¹ Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON M5G2C1, Canada.
² Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
³ Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
⁴ Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

PMID: 35544644
PMCID: PMC9075800
DOI: 10.1126/sciadv.abj1624

Abstract

Several knowledgebases are manually curated to support clinical interpretations of thousands of hotspot somatic mutations in cancer. However, discrepancies or even conflicting interpretations are observed among these databases. Furthermore, many previously undocumented mutations may have clinical or functional impacts on cancer but are not systematically interpreted by existing knowledgebases. To address these challenges, we developed CancerVar to facilitate automated and standardized interpretations for 13 million somatic mutations based on the AMP/ASCO/CAP 2017 guidelines. We further introduced a deep learning framework to predict oncogenicity for these variants using both functional and clinical features. CancerVar achieved satisfactory performance when compared to several independent knowledgebases and, using clinically curated datasets, demonstrated practical utility in classifying somatic variants. In summary, by integrating clinical guidelines with a deep learning framework, CancerVar facilitates clinical interpretation of somatic variants, reduces manual work, improves consistency in variant classification, and promotes implementation of the guidelines.

PubMed Disclaimer

Figures

**Fig. 1.. Summary of the functionality of CancerVar and descriptions of 12 types of evidence.**
AWS, Amazon Web Services; LOF, Loss of Function; MAF, Minor allele frequency; HGMD, Human Gene Mutation Database.

**Fig. 2.. Workflow and architecture of the generator and discriminator/classifier used in OPAI.**
The generator contains three linear layers with batch normalization, LeakyReLu as the activation layer, and a 60% dropout rate in each layer. The final layer is a linear layer with batch normalization and tanh as the activation layer. For the discriminator we implemented three Convolutional Neural Network (CNN) layers with tanh as the activation layer.

**Fig. 3.. Comparison of the interpretation of 43 variants between 20 pathologists and CancerVar.**
The heatmap shows the ratio of 20 pathologists voting for the four tiers: tier I, strong clinical significance (SCS); tier II, potential clinical significance (PCS); tier III, variant of uncertain clinical significance (VUS); and tier IV, benign/likely benign (B/LB). The last two columns are CancerVar-predicted scores and classifications. CancerVar showed an 81% (17 of 21) agreement rate with pathologists’ majority voting for tier I/II and a 60.5% (26 of 43) agreement rate for all tiers. This agreement rate is comparable to the 58% agreement rate among the 20 pathologists, but CancerVar can automate the interpretation process. P, Pathogenic/strong clinical significance; LP:Likely Pathogenic/potential clinical significance; B:(Likely)Benign.

**Fig. 4.. UpSet plot highlighting the intersection of multiple methods with oncogenic prediction from different datasets.**
(A) Mutations were taken from the OncoKB dataset. (B) Mutations were taken from CIViC. (C) Mutations were taken from the IARC TP53 transactivation dataset. (D) Mutations were taken from in vitro cell viability by Ng *et al.* (42).

**Fig. 5.. Performance comparisons.**
(A and B) Receiver operating characteristic (ROC) curves for performance comparison between OPAI and five other machine learning algorithms, including gradient boosting tree (GBT), support vector machine (SVM), AdaBoost (ADA), random forest (RF), and XGBoost (XGB), and five other in silico predictive tools using 6226 somatic mutations as the testing set. (C and D) Area under the precision-recall curve (AUPRC) comparison between OPAI and five other machine learning tools and in silico predictive tools. OPAI outperformed any individual tool in the prediction of somatic driver mutations in cancer. TPR, true-positive rate; FPR, false-positive rate.

**Fig. 6.. A use case of using rule-based and deep learning-based models in CancerVar for interpretation of *FOXA1* variants.**
– We queried the *FOXA1* mutation R219C in prostate cancer. The rule-based prediction of this variant was tier III (uncertain significance), with a score of 7, which is very close to tier II. However, the OPAI model predicted this variant to be oncogenic, with a score of 0.99. On the basis of a manual review of the results, we suggest that this variant has clinical significance.

See this image and copyright information in PMC

References

1. Chakravarty D., Gao J., Phillips S. M., Kundra R., Zhang H., Wang J., Rudolph J. E., Yaeger R., Soumerai T., Nissan M. H., Chang M. T., Chandarlapaty S., Traina T. A., Paik P. K., Ho A. L., Hantash F. M., Grupe A., Baxi S. S., Callahan M. K., Snyder A., Chi P., Danila D., Gounder M., Harding J. J., Hellmann M. D., Iyer G., Janjigian Y., Kaley T., Levine D. A., Lowery M., Omuro A., Postow M. A., Rathkopf D., Shoushtari A. N., Shukla N., Voss M., Paraiso E., Zehir A., Berger M. F., Taylor B. S., Saltz L. B., Riely G. J., Ladanyi M., Hyman D. M., Baselga J., Sabbatini P., Solit D. B., Schultz N., OncoKB: A precision oncology knowledge base. JCO Precis. Oncol. 2017, (2017). - PMC - PubMed
1. Bailey M. H., Tokheim C., Porta-Pardo E., Sengupta S., Bertrand D., Weerasinghe A., Colaprico A., Wendl M. C., Kim J., Reardon B., Ng P. K.-S., Jeong K. J., Cao S., Wang Z., Gao J., Gao Q., Wang F., Liu E. M., Mularoni L., Rubio-Perez C., Nagarajan N., Cortes-Ciriano I., Zhou D. C., Liang W. W., Hess J. M., Yellapantula V. D., Tamborero D., Gonzalez-Perez A., Suphavilai C., Ko J. Y., Khurana E., Park P. J., Van Allen E. M., Liang H.; MC3 Working Group; Cancer Genome Atlas Research Network, Lawrence M. S., Lawrence M. S., Godzik A., Lopez-Bigas N., Stuart J., Wheeler D., Getz G., Chen K., Lazar A. J., Mills G. B., Karchin R., Ding L., Comprehensive characterization of cancer driver genes and mutations. Cell 174, 1034–1035 (2018). - PMC - PubMed
1. Micheel C. M., Sweeney S. M., LeNoue-Newton M. L., Andre F., Bedard P. L., Guinney J., Meijer G. A., Rollins B. J., Sawyers C. L., Schultz N., Shaw K. R. M., Velculescu V. E., Levy M. A.; AACR Project GENIE Consortium , American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange: From inception to first data release and beyond-lessons learned and member institutions’ perspectives. JCO Clin. Cancer Inform. 2, 1–14 (2018). - PMC - PubMed
1. Griffith M., Spies N. C., Krysiak K., McMichael J. F., Coffman A. C., Danos A. M., Ainscough B. J., Ramirez C. A., Rieke D. T., Kujan L., Barnell E. K., Wagner A. H., Skidmore Z. L., Wollam A., Liu C. J., Jones M. R., Bilski R. L., Lesurf R., Feng Y. Y., Shah N. M., Bonakdar M., Trani L., Matlock M., Ramu A., Campbell K. M., Spies G. C., Graubert A. P., Gangavarapu K., Eldred J. M., Larson D. E., Walker J. R., Good B. M., Wu C., Su A. I., Dienstmann R., Margolin A. A., Tamborero D., Lopez-Bigas N., Jones S. J., Bose R., Spencer D. H., Wartman L. D., Wilson R. K., Mardis E. R., Griffith O. L., CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat. Genet. 49, 170–174 (2017). - PMC - PubMed
1. Huang L., Fernandes H., Zia H., Tavassoli P., Rennert H., Pisapia D., Imielinski M., Sboner A., Rubin M. A., Kluk M., Elemento O., The cancer precision medicine knowledge base for structured clinical-grade mutations and interpretations. J. Am. Med. Inform. Assoc. 24, 513–519 (2017). - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CancerVar: An artificial intelligence-empowered platform for clinical interpretation of somatic mutations in cancer

Affiliations

CancerVar: An artificial intelligence-empowered platform for clinical interpretation of somatic mutations in cancer

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous