Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 1;6(1):84-91.
doi: 10.1001/jamaoncol.2019.3985.

Development of Genome-Derived Tumor Type Prediction to Inform Clinical Cancer Care

Affiliations

Development of Genome-Derived Tumor Type Prediction to Inform Clinical Cancer Care

Alexander Penson et al. JAMA Oncol. .

Abstract

Importance: Diagnosing the site of origin for cancer is a pillar of disease classification that has directed clinical care for more than a century. Even in an era of precision oncologic practice, in which treatment is increasingly informed by the presence or absence of mutant genes responsible for cancer growth and progression, tumor origin remains a critical factor in tumor biologic characteristics and therapeutic sensitivity.

Objective: To evaluate whether data derived from routine clinical DNA sequencing of tumors could complement conventional approaches to enable improved diagnostic accuracy.

Design, setting, and participants: A machine learning approach was developed to predict tumor type from targeted panel DNA sequence data obtained at the point of care, incorporating both discrete molecular alterations and inferred features such as mutational signatures. This algorithm was trained on 7791 tumors representing 22 cancer types selected from a prospectively sequenced cohort of patients with advanced cancer.

Results: The correct tumor type was predicted for 5748 of the 7791 patients (73.8%) in the training set as well as 8623 of 11 644 patients (74.1%) in an independent cohort. Predictions were assigned probabilities that reflected empirical accuracy, with 3388 cases (43.5%) representing high-confidence predictions (>95% probability). Informative molecular features and feature categories varied widely by tumor type. Genomic analysis of plasma cell-free DNA yielded accurate predictions in 45 of 60 cases (75.0%), suggesting that this approach may be applied in diverse clinical settings including as an adjunct to cancer screening. Likely tissues of origin were predicted from targeted tumor sequencing in 95 of 141 patients (67.4%) with cancers of unknown primary site. Applying this method prospectively to patients under active care enabled genome-directed reassessment of diagnosis in 2 patients initially presumed to have metastatic breast cancer, leading to the selection of more appropriate treatments, which elicited clinical responses.

Conclusions and relevance: These results suggest that the application of artificial intelligence to predict tissue of origin in oncologic practice can act as a useful complement to conventional histologic review to provide integrated pathologic diagnoses, often with important therapeutic implications.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Varghese reported participation in industry-sponsored trials with Eli Lilly, Taiho, Verastem, Biomed Valley, Silenseed, and Bristol-Myers Squibb. Dr Al-Ahmadie reported receiving personal fees from AstraZeneca and Bristol-Myers Squibb and compensation for consulting and advisory board activities with Bristol-Myers Squibb, EMD Serono, and AstraZeneca outside the submitted work. Dr Razavi reported receiving personal fees from Novartis and grants from Grail Inc outside the submitted work. Dr Chandarlapaty reported receiving grants from Daiichi Sankyo; research funding from Genentech, Sanofi and Eli Lilly; personal fees and nonfinancial support from Novartis; personal fees from Eli Lilly, Revolutions Medicine, Sermonix, Chugai Pharma, and Context Therapeutics; and nonfinancial support from Sun Pharma outside the submitted work. Dr Rosenberg reported receiving personal fees and other stock ownership from Merck; stock ownership in Illumina; and personal fees from AstraZeneca, Astellas, Chugai Pharma, Seattle Genetics, Roche Genentech, Bayer, Bristol-Myers Squibb, Eli Lilly, EMD Serono, Inovio, Sensei, Adicet Bio, BioClin Therapeutics, Fortress Biotech, Pharmacyclics, Western Oncolytics, and GlaxoSmithKline outside the submitted work; in addition, Dr Rosenberg had a patent to ERCC2 for platinum sensitivity issued. Dr Tsui reported receiving travel sponsorships and honoraria from Nanodigmbio and Cambridge Healthtech Institute outside the submitted work. Dr Ladanyi reported performing advisory board activities with Boehringer Ingelheim, AstraZeneca, Bristol-Myers Squibb, Takea, and Bayer, and receiving research support from Loxo Oncology and Helsinn Healthcare. Dr Solit reported receiving personal fees from Pfizer, Loxo Oncology, Illumina, Vivideon Therapeutics, and Lilly Oncology outside the submitted work. Dr Klimstra reported receiving personal fees from Paige.AI, Merck, American Registry of Pathology, and UpToDate outside the submitted work. Dr Hyman reported receiving personal fees from Pfizer, CytomX Therapeutics, Chugai Pharma, Genentech/Roche, and Boehringer Ingelheim; grants and personal fees from AstraZeneca and Bayer; and grants from Puma Biotechnology outside the submitted work. Dr Taylor reported receiving grants from Illumina during the conduct of the study. Dr Berger reported receiving grants from Illumina during the conduct of the study and personal fees from Roche outside the submitted work. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Classifier Performance Across Cancers
A, Schematic of random forest classifier. Molecular alterations from Memorial Sloan Kettering–Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT) sequencing of 7791 patients diagnosed with 1 of 22 tumor types were used to train the classifier. For a given combination of genomic features, the classifier returns a calibrated probability of each tumor type. B, Performance of the classifier across 22 cancer types. True (established) cancer types are displayed horizontally and predicted cancer types are displayed vertically. The number of tumors for each cancer type in the cohort is shown at the top, and sensitivity and specificity of predictions are indicated at the top and right. C, The fraction of samples (vertical axis) with the correct prediction made at or above a given probability (horizontal axis) within each cancer type. CNAs indicates copy number alterations; GIST, gastrointestinal stromal tumor; NSCLC, non–small cell lung cancer; PNET, pancreatic neuroendocrine tumor; Pr, probability; and SCLC, small cell lung cancer.
Figure 2.
Figure 2.. Predictive Power of Molecular Features and Feature Classes
A, Relative information content of different feature categories as shown by the Cohen κ metric as a measure of overall accuracy. Black diamonds represent the accuracy of a classifier built for each feature category as indicated; open circles represent the accuracy on incrementally adding feature categories (top to bottom). Mutations encompass hotspots and non-hotspots. B, Relative importance of different feature categories in different cancer types. Circle size represents the mean contribution of the features in each category to accurate predictions in each cancer type. C, Selected individual features for predicting breast cancer and non–small cell lung cancer (NSCLC) in the study cohort and their relative contribution. Informative features driving correct predictions in all tumor types are shown in eFigure 1 in the Supplement. D, Different features contributing to tumor type predictions in BRAF V600E-mutant colorectal cancer, melanoma, and thyroid cancer, establishing the value of feature interactions to inform tumor type prediction in a cohort of patients that nevertheless share a common molecular alteration. CNA indicates copy number alterations; MMR, mismatch repair; VUS, variants of unknown significance.

Comment in

References

    1. Hyman DM, Puzanov I, Subbiah V, et al. . Vemurafenib in Multiple Nonmelanoma Cancers with BRAF V600 mutations. N Engl J Med. 2015;373(8):726-736. doi:10.1056/NEJMoa1502309 - DOI - PMC - PubMed
    1. Varghese AM, Arora A, Capanu M, et al. . Clinical and molecular characterization of patients with cancer of unknown primary in the modern era. Ann Oncol. 2017;28(12):3015-3021. doi:10.1093/annonc/mdx545 - DOI - PMC - PubMed
    1. Golub TR, Slonim DK, Tamayo P, et al. . Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531-537. doi:10.1126/science.286.5439.531 - DOI - PubMed
    1. Greco FA, Spigel DR, Yardley DA, Erlander MG, Ma XJ, Hainsworth JD. Molecular profiling in unknown primary cancer: accuracy of tissue of origin prediction. Oncologist. 2010;15(5):500-506. doi:10.1634/theoncologist.2009-0328 - DOI - PMC - PubMed
    1. Marquard AM, Birkbak NJ, Thomas CE, et al. . TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen. BMC Med Genomics. 2015;8:58. doi:10.1186/s12920-015-0130-0 - DOI - PMC - PubMed

Publication types