Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type

Affiliations

¹ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA; Arizona State University, Phoenix, AZ, USA.
² Department of Neurosurgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
³ Ruesch Center for The Cure of Gastrointestinal Cancers, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
⁴ Wayne State University/Karmanos Cancer Institute, Detroit, MI, USA.
⁵ Division of Hematology and Oncology, Penn State Hershey Cancer Institute, Hershey, PA, USA.
⁶ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA.
⁷ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA; Department of Clinical Pharmacy and Outcomes Sciences, University of South Carolina, Columbia, SC, USA.
⁸ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA; Division of Hematology and Oncology, University of California in San Francisco, San Francisco, CA, USA.
⁹ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA; Arizona State University, Phoenix, AZ, USA. Electronic address: dspetzler@carisls.com.

PMID: 33465745
PMCID: PMC7815805
DOI: 10.1016/j.tranon.2021.101016

Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type

Jim Abraham et al. Transl Oncol. 2021 Mar.

. 2021 Mar;14(3):101016.

doi: 10.1016/j.tranon.2021.101016. Epub 2021 Jan 16.

Authors

Affiliations

¹ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA; Arizona State University, Phoenix, AZ, USA.
² Department of Neurosurgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
³ Ruesch Center for The Cure of Gastrointestinal Cancers, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, USA.
⁴ Wayne State University/Karmanos Cancer Institute, Detroit, MI, USA.
⁵ Division of Hematology and Oncology, Penn State Hershey Cancer Institute, Hershey, PA, USA.
⁶ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA.
⁷ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA; Department of Clinical Pharmacy and Outcomes Sciences, University of South Carolina, Columbia, SC, USA.
⁸ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA; Division of Hematology and Oncology, University of California in San Francisco, San Francisco, CA, USA.
⁹ Caris Life Sciences, 4610 South 44th Place, Phoenix, AZ 85040, USA; Arizona State University, Phoenix, AZ, USA. Electronic address: dspetzler@carisls.com.

PMID: 33465745
PMCID: PMC7815805
DOI: 10.1016/j.tranon.2021.101016

Abstract

Cancer of Unknown Primary (CUP) occurs in 3-5% of patients when standard histological diagnostic tests are unable to determine the origin of metastatic cancer. Typically, a CUP diagnosis is treated empirically and has very poor outcomes, with median overall survival less than one year. Gene expression profiling alone has been used to identify the tissue of origin but struggles with low neoplastic percentage in metastatic sites which is where identification is often most needed. MI GPSai, a Genomic Prevalence Score, uses DNA sequencing and whole transcriptome data coupled with machine learning to aid in the diagnosis of cancer. The algorithm trained on genomic data from 34,352 cases and genomic and transcriptomic data from 23,137 cases and was validated on 19,555 cases. MI GPSai predicted the tumor type in the labeled data set with an accuracy of over 94% on 93% of cases while deliberating amongst 21 possible categories of cancer. When also considering the second highest prediction, the accuracy increases to 97%. Additionally, MI GPSai rendered a prediction for 71.7% of CUP cases. Pathologist evaluation of discrepancies between submitted diagnosis and MI GPSai predictions resulted in change of diagnosis in 41.3% of the time. MI GPSai provides clinically meaningful information in a large proportion of CUP cases and inclusion of MI GPSai in clinical routine could improve diagnostic fidelity. Moreover, all genomic markers essential for therapy selection are assessed in this assay, maximizing the clinical utility for patients within a single test.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig 1 — **Fig. 1**
CONSORT diagram. The DNA and RNA components of MI GPSai were trained using a combined 57,489 patients, which were then validated on 4,602 non-CUP and 185 CUP patients to determine optimal performance settings. Following this evaluation, MI GPSai rendered a prediction on routinely profiled cases resulting in the final prospective validation set and CUP cases.

Fig 2 — **Fig. 2**
Prediction matrix in the prospective validation set. Each row shows the percentage of the actual disease types observed when a MI GPSai achieves a score > 0.835. The diagonal represents the PPV for the given disease type. Blank cells have values between 0 and 1.

Fig 3 — **Fig. 3**
Confusion matrix in the prospective validation set. Each column shows observed predictions for each disease type when a MI GPSai achieves a score > 0.835. The diagonal represents the sensitivity for the given disease type. Blank cells have values between 0 and 1.

Fig 4 — **Fig. 4**
A clinical example showing a representative case in which the pathological diagnosis was reassigned based on MI GPSai predictions using Whole Exome and Whole Transcriptome Sequencing (WES, WTS) data. (A) Molecular profiling was performed using WES and WTS data that was then routed into the MI GPSai pipeline for diagnostic predictions. (B) The whole transcriptome expression data was then used to select for lineage specific gene expression to guide immunohistochemical antibody selection, the current gold-standard for lineage assignment. In the example provided, the mean RNA expression of Uroplakin II and GATA3 of the urothelial carcinoma cases in our database is relatively high (box plots). With the specimen being considered (red line), Uroplakin II and GATA3 RNA expression high. (C) and (D) Immunohistochemical evaluation of the tumor with clinically validated antibodies against Uroplakin II and GATA3 confirmed lineage specific protein expression diagnostic of urothelial carcinoma. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

See this image and copyright information in PMC

References

1. Massard C., Loriot Y., Fizazi K. Carcinomas of an unknown primary origin–diagnosis and treatment. Nat. Rev. Clin. Oncol. 2011;8(12):701–710. - PubMed
1. Varadhachary G.R., Raber M.N. Cancer of unknown primary site. N. Engl. J. Med. 2014;371(8):757–765. - PubMed
1. DeYoung B.R., Wick M.R. Immunohistologic evaluation of metastatic carcinomas of unknown origin: an algorithmic approach. Semin. Diagn. Pathol. 2000;17(3):184–193. - PubMed
1. Anderson G.G., Weiss L.M. Determining tissue of origin for metastatic cancers: meta-analysis and literature review of immunohistochemistry performance. Appl. Immunohistochem. Mol. Morphol. 2010;18(1):3–8. - PubMed
1. Park S.Y., Kim B.H., Kim J.H., Lee S., Kang G.H. Panels of immunohistochemical markers help determine primary sites of metastatic adenocarcinoma. Arch. Pathol. Lab. Med. 2007;131(10):1561–1567. - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type

Affiliations

Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources