Comparison of pathway and gene-level models for cancer prognosis prediction
- PMID: 32111152
- PMCID: PMC7048092
- DOI: 10.1186/s12859-020-3423-z
Comparison of pathway and gene-level models for cancer prognosis prediction
Abstract
Background: Cancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care. A promising direction for improving the performance and interpretation of expression-based predictive models involves the aggregation of gene-level data into biological pathways. While many studies have used pathway-level predictors for cancer survival analysis, a comprehensive comparison of pathway-level and gene-level prognostic models has not been performed. To address this gap, we characterized the performance of penalized Cox proportional hazard models built using either pathway- or gene-level predictors for the cancers profiled in The Cancer Genome Atlas (TCGA) and pathways from the Molecular Signatures Database (MSigDB).
Results: When analyzing TCGA data, we found that pathway-level models are more parsimonious, more robust, more computationally efficient and easier to interpret than gene-level models with similar predictive performance. For example, both pathway-level and gene-level models have an average Cox concordance index of ~ 0.85 for the TCGA glioma cohort, however, the gene-level model has twice as many predictors on average, the predictor composition is less stable across cross-validation folds and estimation takes 40 times as long as compared to the pathway-level model. When the complex correlation structure of the data is broken by permutation, the pathway-level model has greater predictive performance while still retaining superior interpretative power, robustness, parsimony and computational efficiency relative to the gene-level models. For example, the average concordance index of the pathway-level model increases to 0.88 while the gene-level model falls to 0.56 for the TCGA glioma cohort using survival times simulated from uncorrelated gene expression data.
Conclusion: The results of this study show that when the correlations among gene expression values are low, pathway-level analyses can yield better predictive performance, greater interpretative power, more robust models and less computational cost relative to a gene-level model. When correlations among genes are high, a pathway-level analysis provides equivalent predictive power compared to a gene-level analysis while retaining the advantages of interpretability, robustness and computational efficiency.
Keywords: Cancer prognosis prediction; Gene expression data; Inter-gene correlation; L1 penalized regression model; Pathway analysis.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures










Similar articles
-
Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction.BMC Cancer. 2021 Sep 25;21(1):1053. doi: 10.1186/s12885-021-08796-3. BMC Cancer. 2021. PMID: 34563154 Free PMC article.
-
Cancer prognosis prediction using somatic point mutation and copy number variation data: a comparison of gene-level and pathway-based models.BMC Bioinformatics. 2020 Oct 20;21(1):467. doi: 10.1186/s12859-020-03791-0. BMC Bioinformatics. 2020. PMID: 33081688 Free PMC article.
-
Pathway-Structured Predictive Model for Cancer Survival Prediction: A Two-Stage Approach.Genetics. 2017 Jan;205(1):89-100. doi: 10.1534/genetics.116.189191. Epub 2016 Nov 9. Genetics. 2017. PMID: 28049703 Free PMC article.
-
Identification of potential biomarkers related to glioma survival by gene expression profile analysis.BMC Med Genomics. 2019 Mar 20;11(Suppl 7):34. doi: 10.1186/s12920-019-0479-6. BMC Med Genomics. 2019. PMID: 30894197 Free PMC article.
-
Angiogenesis-related lncRNAs predict the prognosis signature of stomach adenocarcinoma.BMC Cancer. 2021 Dec 7;21(1):1312. doi: 10.1186/s12885-021-08987-y. BMC Cancer. 2021. PMID: 34876056 Free PMC article.
Cited by
-
Latent Variables Capture Pathway-Level Points of Departure in High-Throughput Toxicogenomic Data.Chem Res Toxicol. 2022 Apr 18;35(4):670-683. doi: 10.1021/acs.chemrestox.1c00444. Epub 2022 Mar 25. Chem Res Toxicol. 2022. PMID: 35333521 Free PMC article.
-
Pan-cancer evaluation of gene expression and somatic alteration data for cancer prognosis prediction.BMC Cancer. 2021 Sep 25;21(1):1053. doi: 10.1186/s12885-021-08796-3. BMC Cancer. 2021. PMID: 34563154 Free PMC article.
-
Development and validation of prognostic models based on cell cycle-related signatures for predicting the prognosis of patients with lung adenocarcinoma.Transl Cancer Res. 2025 May 30;14(5):2900-2915. doi: 10.21037/tcr-24-1479. Epub 2025 May 27. Transl Cancer Res. 2025. PMID: 40530147 Free PMC article.
-
Survival-related genes are diversified across cancers but generally enriched in cancer hallmark pathways.BMC Genomics. 2022 May 4;22(Suppl 5):918. doi: 10.1186/s12864-022-08581-x. BMC Genomics. 2022. PMID: 35508961 Free PMC article.
-
Systematic assessment of prognostic molecular features across cancers.Cell Genom. 2023 Feb 2;3(3):100262. doi: 10.1016/j.xgen.2023.100262. eCollection 2023 Mar 8. Cell Genom. 2023. PMID: 36950380 Free PMC article.
References
-
- Barillot E. Computational systems biology of Cancer. Boca Raton: CRC Press; 2012.
-
- Tandon AK, Clark GM, Chamness GC, Ullrich A, McGuire WL. HER-2/neu oncogene protein and prognosis in breast cancer. J Clin Oncol. 1989;7(8):1120–1128. - PubMed
-
- Jenssen TK, Kuo WP, Stokke T, Hovig E. Associations between gene expressions in breast cancer and patient survival. Hum Genet. 2002;111(4–5):411–420. - PubMed
-
- Park MY, Hastie T. L1-regularization path algorithm for generalized linear models. J R Stat Soc Ser B Stat Methodol. 2007;69(4):659–677.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous