TOOme: A Novel Computational Framework to Infer Cancer Tissue-of-Origin by Integrating Both Gene Mutation and Expression
- PMID: 32509741
- PMCID: PMC7248358
- DOI: 10.3389/fbioe.2020.00394
TOOme: A Novel Computational Framework to Infer Cancer Tissue-of-Origin by Integrating Both Gene Mutation and Expression
Abstract
Metastatic cancers require further diagnosis to determine their primary tumor sites. However, the tissue-of-origin for around 5% tumors could not be identified by routine medical diagnosis according to a statistics in the United States. With the development of machine learning techniques and the accumulation of big cancer data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), it is now feasible to predict cancer tissue-of-origin by computational tools. Metastatic tumor inherits characteristics from its tissue-of-origin, and both gene expression profile and somatic mutation have tissue specificity. Thus, we developed a computational framework to infer tumor tissue-of-origin by integrating both gene mutation and expression (TOOme). Specifically, we first perform feature selection on both gene expressions and mutations by a random forest method. The selected features are then used to build up a multi-label classification model to infer cancer tissue-of-origin. We adopt a few popular multiple-label classification methods, which are compared by the 10-fold cross validation process. We applied TOOme to the TCGA data containing 7,008 non-metastatic samples across 20 solid tumors. Seventy four genes by gene expression profile and six genes by gene mutation are selected by the random forest process, which can be divided into two categories: (1) cancer type specific genes and (2) those expressed or mutated in several cancers with different levels of expression or mutation rates. Function analysis indicates that the selected genes are significantly enriched in gland development, urogenital system development, hormone metabolic process, thyroid hormone generation prostate hormone generation and so on. According to the multiple-label classification method, random forest performs the best with a 10-fold cross-validation prediction accuracy of 96%. We also use the 19 metastatic samples from TCGA and 256 cancer samples downloaded from GEO as independent testing data, for which TOOme achieves a prediction accuracy of 89%. The cross-validation validation accuracy is better than those using gene expression (i.e., 95%) and gene mutation (53%) alone. In conclusion, TOOme provides a quick yet accurate alternative to traditional medical methods in inferring cancer tissue-of-origin. In addition, the methods combining somatic mutation and gene expressions outperform those using gene expression or mutation alone.
Keywords: cross-validation; gene expression; random forest; somatic mutation; tissue-of-origin.
Copyright © 2020 He, Lang, Wang, Liu, Lu, He, Gao, Bing, Tian and Yang.
Figures





Similar articles
-
Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin.Front Cell Dev Biol. 2021 May 3;9:619330. doi: 10.3389/fcell.2021.619330. eCollection 2021. Front Cell Dev Biol. 2021. PMID: 34012960 Free PMC article.
-
Identifying cancer tissue-of-origin by a novel machine learning method based on expression quantitative trait loci.Front Oncol. 2022 Aug 9;12:946552. doi: 10.3389/fonc.2022.946552. eCollection 2022. Front Oncol. 2022. PMID: 36016607 Free PMC article.
-
Predicting Cancer Tissue-of-Origin by a Machine Learning Method Using DNA Somatic Mutation Data.Front Genet. 2020 Jul 14;11:674. doi: 10.3389/fgene.2020.00674. eCollection 2020. Front Genet. 2020. PMID: 32760423 Free PMC article.
-
A machine learning framework to trace tumor tissue-of-origin of 13 types of cancer based on DNA somatic mutation.Biochim Biophys Acta Mol Basis Dis. 2020 Nov 1;1866(11):165916. doi: 10.1016/j.bbadis.2020.165916. Epub 2020 Aug 7. Biochim Biophys Acta Mol Basis Dis. 2020. PMID: 32771416
-
Identification of Tumor Tissue of Origin with RNA-Seq Data and Using Gradient Boosting Strategy.Biomed Res Int. 2021 Feb 17;2021:6653793. doi: 10.1155/2021/6653793. eCollection 2021. Biomed Res Int. 2021. Retraction in: Biomed Res Int. 2023 Nov 29;2023:9865973. doi: 10.1155/2023/9865973. PMID: 33681364 Free PMC article. Retracted.
Cited by
-
An integration of hybrid MCDA framework to the statistical analysis of computer-based health monitoring applications.Front Public Health. 2024 Jan 8;11:1341871. doi: 10.3389/fpubh.2023.1341871. eCollection 2023. Front Public Health. 2024. PMID: 38259786 Free PMC article.
-
RUNX3 Expression Level Is Correlated with the Clinical and Pathological Characteristics in Endometrial Cancer: A Systematic Review and Meta-analysis.Biomed Res Int. 2021 Jul 14;2021:9995384. doi: 10.1155/2021/9995384. eCollection 2021. Biomed Res Int. 2021. Retraction in: Biomed Res Int. 2023 Nov 29;2023:9789534. doi: 10.1155/2023/9789534. PMID: 34337071 Free PMC article. Retracted.
-
Palindrome-Embedded Hairpin Structure and Its Target-Catalyzed Padlock Cyclization for Label-Free MicroRNA-Initiated Rolling Circle Amplification.ACS Omega. 2023 Jan 4;8(2):2253-2261. doi: 10.1021/acsomega.2c06532. eCollection 2023 Jan 17. ACS Omega. 2023. PMID: 36687024 Free PMC article.
-
A Recent Advance in the Diagnosis, Treatment, and Vaccine Development for Human Schistosomiasis.Trop Med Infect Dis. 2024 Oct 15;9(10):243. doi: 10.3390/tropicalmed9100243. Trop Med Infect Dis. 2024. PMID: 39453270 Free PMC article. Review.
-
Prognostic and immunological role of cuproptosis-related protein FDX1 in pan-cancer.Front Genet. 2022 Aug 19;13:962028. doi: 10.3389/fgene.2022.962028. eCollection 2022. Front Genet. 2022. PMID: 36061184 Free PMC article.
References
LinkOut - more resources
Full Text Sources