Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar;14(3):101016.
doi: 10.1016/j.tranon.2021.101016. Epub 2021 Jan 16.

Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type

Affiliations

Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type

Jim Abraham et al. Transl Oncol. 2021 Mar.

Abstract

Cancer of Unknown Primary (CUP) occurs in 3-5% of patients when standard histological diagnostic tests are unable to determine the origin of metastatic cancer. Typically, a CUP diagnosis is treated empirically and has very poor outcomes, with median overall survival less than one year. Gene expression profiling alone has been used to identify the tissue of origin but struggles with low neoplastic percentage in metastatic sites which is where identification is often most needed. MI GPSai, a Genomic Prevalence Score, uses DNA sequencing and whole transcriptome data coupled with machine learning to aid in the diagnosis of cancer. The algorithm trained on genomic data from 34,352 cases and genomic and transcriptomic data from 23,137 cases and was validated on 19,555 cases. MI GPSai predicted the tumor type in the labeled data set with an accuracy of over 94% on 93% of cases while deliberating amongst 21 possible categories of cancer. When also considering the second highest prediction, the accuracy increases to 97%. Additionally, MI GPSai rendered a prediction for 71.7% of CUP cases. Pathologist evaluation of discrepancies between submitted diagnosis and MI GPSai predictions resulted in change of diagnosis in 41.3% of the time. MI GPSai provides clinically meaningful information in a large proportion of CUP cases and inclusion of MI GPSai in clinical routine could improve diagnostic fidelity. Moreover, all genomic markers essential for therapy selection are assessed in this assay, maximizing the clinical utility for patients within a single test.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig 1
Fig. 1
CONSORT diagram. The DNA and RNA components of MI GPSai were trained using a combined 57,489 patients, which were then validated on 4,602 non-CUP and 185 CUP patients to determine optimal performance settings. Following this evaluation, MI GPSai rendered a prediction on routinely profiled cases resulting in the final prospective validation set and CUP cases.
Fig 2
Fig. 2
Prediction matrix in the prospective validation set. Each row shows the percentage of the actual disease types observed when a MI GPSai achieves a score > 0.835. The diagonal represents the PPV for the given disease type. Blank cells have values between 0 and 1.
Fig 3
Fig. 3
Confusion matrix in the prospective validation set. Each column shows observed predictions for each disease type when a MI GPSai achieves a score > 0.835. The diagonal represents the sensitivity for the given disease type. Blank cells have values between 0 and 1.
Fig 4
Fig. 4
A clinical example showing a representative case in which the pathological diagnosis was reassigned based on MI GPSai predictions using Whole Exome and Whole Transcriptome Sequencing (WES, WTS) data. (A) Molecular profiling was performed using WES and WTS data that was then routed into the MI GPSai pipeline for diagnostic predictions. (B) The whole transcriptome expression data was then used to select for lineage specific gene expression to guide immunohistochemical antibody selection, the current gold-standard for lineage assignment. In the example provided, the mean RNA expression of Uroplakin II and GATA3 of the urothelial carcinoma cases in our database is relatively high (box plots). With the specimen being considered (red line), Uroplakin II and GATA3 RNA expression high. (C) and (D) Immunohistochemical evaluation of the tumor with clinically validated antibodies against Uroplakin II and GATA3 confirmed lineage specific protein expression diagnostic of urothelial carcinoma. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

References

    1. Massard C., Loriot Y., Fizazi K. Carcinomas of an unknown primary origin–diagnosis and treatment. Nat. Rev. Clin. Oncol. 2011;8(12):701–710. - PubMed
    1. Varadhachary G.R., Raber M.N. Cancer of unknown primary site. N. Engl. J. Med. 2014;371(8):757–765. - PubMed
    1. DeYoung B.R., Wick M.R. Immunohistologic evaluation of metastatic carcinomas of unknown origin: an algorithmic approach. Semin. Diagn. Pathol. 2000;17(3):184–193. - PubMed
    1. Anderson G.G., Weiss L.M. Determining tissue of origin for metastatic cancers: meta-analysis and literature review of immunohistochemistry performance. Appl. Immunohistochem. Mol. Morphol. 2010;18(1):3–8. - PubMed
    1. Park S.Y., Kim B.H., Kim J.H., Lee S., Kang G.H. Panels of immunohistochemical markers help determine primary sites of metastatic adenocarcinoma. Arch. Pathol. Lab. Med. 2007;131(10):1561–1567. - PubMed

LinkOut - more resources