Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 25;21(1):1053.
doi: 10.1186/s12885-021-08796-3.

Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction

Affiliations

Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction

Xingyu Zheng et al. BMC Cancer. .

Abstract

Background: Over the past decades, approaches for diagnosing and treating cancer have seen significant improvement. However, the variability of patient and tumor characteristics has limited progress on methods for prognosis prediction. The development of high-throughput omics technologies now provides multiple approaches for characterizing tumors. Although a large number of published studies have focused on integration of multi-omics data and use of pathway-level models for cancer prognosis prediction, there still exists a gap of knowledge regarding the prognostic landscape across multi-omics data for multiple cancer types using both gene-level and pathway-level predictors.

Methods: In this study, we systematically evaluated three often available types of omics data (gene expression, copy number variation and somatic point mutation) covering both DNA-level and RNA-level features. We evaluated the landscape of predictive performance of these three omics modalities for 33 cancer types in the TCGA using a Lasso or Group Lasso-penalized Cox model and either gene or pathway level predictors.

Results: We constructed the prognostic landscape using three types of omics data for 33 cancer types on both the gene and pathway levels. Based on this landscape, we found that predictive performance is cancer type dependent and we also highlighted the cancer types and omics modalities that support the most accurate prognostic models. In general, models estimated on gene expression data provide the best predictive performance on either gene or pathway level and adding copy number variation or somatic point mutation data to gene expression data does not improve predictive performance, with some exceptional cohorts including low grade glioma and thyroid cancer. In general, pathway-level models have better interpretative performance, higher stability and smaller model size across multiple cancer types and omics data types relative to gene-level models.

Conclusions: Based on this landscape and comprehensively comparison, models estimated on gene expression data provide the best predictive performance on either gene or pathway level. Pathway-level models have better interpretative performance, higher stability and smaller model size relative to gene-level models.

Keywords: Cancer prognosis prediction; L1 penalized regression model; Multi-omics data; Pathway analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Workflow of gene-level and pathway-level models. Gene-level data matrix of GE/SPM/CNV is input into the workflow. Genes are pre-filtered either by intersecting with the pathway collection (shown as ‘Path’) or further filtering the genes by intersecting with COSMIC genes (shown as ‘COSMIC’) or significant genes (p-value less than 0.05) in univariable Cox models (shown as ‘Cox’). Then, for the pathway-level models, gene set enrichment is conducted to transform the gene-level matrix into a pathway-level matrix. For GE and CNV data, GSVA is applied and for SPM, odds ratio is applied to conduct gene set enrichment. While for the gene-level models, this step is skipped. With the filtered gene-level data matrix or the transformed pathway-level data matrix as the predictor matrix, we conducted nested cross validation to test the predictive performance of gene-level and pathway-level models. A 5-fold cross validation separates the data into training and test sets. In the training set, a Lasso (least absolute shrinkage and selection operator) or L1-penalized Cox model is fit with the shrinkage parameter chosen by a nested 10-fold cross validation. With the selected predictors and coefficient estimates, the estimated model is applied to the test set and three metrics are adopted to measure the prediction: i) the predictive performance is measured by the concordance index, ii) the model robustness is measured by Fleiss Kappa, iii) the model parsimony is measured by average model size
Fig. 2
Fig. 2
The comparative results for both gene-level and pathway-level prognostic models estimated using GE, SPM and CNV data from multiple cancer types. ‘PLv’ represents ‘pathway-level’ and ‘GLv’ represents ‘gene-level’. The dots represent the values of the concordance index and the bars represent the standard error
Fig. 3
Fig. 3
Comparative results of adding SPM or CNV data to GE data. ‘PLv’ represents ‘pathway-level’ and ‘GLv’ represents ‘gene-level’. The dots represent the values of the concordance index and the bars represent the standard error
Fig. 4
Fig. 4
Heatmap of concordance index, Fleiss Kappa statistics and average model size across cohorts and models. ‘PLv’ represents ‘pathway-level’ and ‘GLv’ represents ‘gene-level’. The cells in grey represent models that cannot converge and in this case, no predictors could be selected to predict prognosis

References

    1. Cronin KA, Lake AJ, Scott S, Sherman RL, Noone AM, Howlader N, Henley SJ, Anderson RN, Firth AU, Ma J, Kohler BA, Jemal A. Annual report to the nation on the status of Cancer, part I: national cancer statistics. Cancer. 2018;124(13):2785–2800. doi: 10.1002/cncr.31551. - DOI - PMC - PubMed
    1. Lee VC. Cancer immunotherapy, part 3: challenges and future trends. Pharm Ther. 2017;42(8):514–521. - PMC - PubMed
    1. Dalton WS, Friend SH. Cancer biomarkers - An invitation to the table. Science. 2006;312(5777):1165–1168. doi: 10.1126/science.1125948. - DOI - PubMed
    1. Gaspar L, Scott C, Rotman M, Asbell S, Phillips T, Wasserman T, McKenna WG, Byhardt R. Recursive partitioning analysis (RPA) of prognostic factors in three radiation therapy oncology group (RTOG) brain metastases trials. Int J Radiat Oncol Biol Phys. 1997;37(4):745–751. doi: 10.1016/S0360-3016(96)00619-0. - DOI - PubMed
    1. Sperduto PW, Berkey B, Gaspar LE, Mehta M, Curran W. A new prognostic index and comparison to three other indices for patients with brain metastases: an analysis of 1,960 patients in the RTOG database. Int J Radiat Oncol Biol Phys. 2008;70(2):510–514. doi: 10.1016/j.ijrobp.2007.06.074. - DOI - PubMed