CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence
- PMID: 33039710
- PMCID: PMC7553237
- DOI: 10.1016/j.ebiom.2020.103030
CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence
Abstract
Background: Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients.
Methods: We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets FINDINGS: CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively.
Interpretation: The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform.
Funding: NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.
Keywords: Cancer; Cancer-of-unknown-primary; Cell-of-origin; Classification; Convolutional neural network; Deep learning; Inception model; Machine learning; TCGA.
Copyright © 2020 The Authors. Published by Elsevier B.V. All rights reserved.
Figures





Similar articles
-
TOD-CUP: a gene expression rank-based majority vote algorithm for tissue origin diagnosis of cancers of unknown primary.Brief Bioinform. 2021 Mar 22;22(2):2106-2118. doi: 10.1093/bib/bbaa031. Brief Bioinform. 2021. PMID: 32266390
-
AI-based pathology predicts origins for cancers of unknown primary.Nature. 2021 Jun;594(7861):106-110. doi: 10.1038/s41586-021-03512-4. Epub 2021 May 5. Nature. 2021. PMID: 33953404
-
An integrated tool for determining the primary origin site of metastatic tumours.J Clin Pathol. 2018 Jul;71(7):584-593. doi: 10.1136/jclinpath-2017-204887. Epub 2017 Dec 16. J Clin Pathol. 2018. PMID: 29248889 Free PMC article.
-
The practical utility of AI-assisted molecular profiling in the diagnosis and management of cancer of unknown primary: an updated review.Virchows Arch. 2024 Feb;484(2):369-375. doi: 10.1007/s00428-023-03708-1. Epub 2023 Nov 24. Virchows Arch. 2024. PMID: 37999736 Review.
-
A Review on Cancer of Unknown Primary Origin: The Role of Molecular Biomarkers in the Identification of Unknown Primary Origin.Methods Mol Biol. 2020;2204:109-119. doi: 10.1007/978-1-0716-0904-0_10. Methods Mol Biol. 2020. PMID: 32710319 Review.
Cited by
-
Automatic origin prediction of liver metastases via hierarchical artificial-intelligence system trained on multiphasic CT data: a retrospective, multicentre study.EClinicalMedicine. 2024 Feb 1;69:102464. doi: 10.1016/j.eclinm.2024.102464. eCollection 2024 Mar. EClinicalMedicine. 2024. PMID: 38333364 Free PMC article.
-
Omics Data and Data Representations for Deep Learning-Based Predictive Modeling.Int J Mol Sci. 2022 Oct 14;23(20):12272. doi: 10.3390/ijms232012272. Int J Mol Sci. 2022. PMID: 36293133 Free PMC article. Review.
-
Generalising uncertainty improves accuracy and safety of deep learning analytics applied to oncology.Sci Rep. 2023 May 6;13(1):7395. doi: 10.1038/s41598-023-31126-5. Sci Rep. 2023. PMID: 37149669 Free PMC article.
-
Validation of a Transcriptome-Based Assay for Classifying Cancers of Unknown Primary Origin.Mol Diagn Ther. 2023 Jul;27(4):499-511. doi: 10.1007/s40291-023-00650-5. Epub 2023 Apr 26. Mol Diagn Ther. 2023. PMID: 37099070 Free PMC article.
-
OncoTrace-TOO: Interpretable Machine Learning Framework for Cancer Tissue-of-Origin Identification Using Transcriptomic Signatures.Cancer Rep (Hoboken). 2025 Aug;8(8):e70311. doi: 10.1002/cnr2.70311. Cancer Rep (Hoboken). 2025. PMID: 40784724 Free PMC article.
References
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources