High-dimensional Cox models: the choice of penalty as part of the model building process
- PMID: 20166132
- DOI: 10.1002/bimj.200900064
High-dimensional Cox models: the choice of penalty as part of the model building process
Abstract
The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high-dimensional models where the number of covariates is much larger than the number of observations (p>>n) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L(1)-penalized Cox regression using the lasso (Tibshirani (1997). Statistics in Medicine 16, 385-395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li (2001). Journal of the American Statistical Association 96, 1348-1360; Fan and Li (2002). The Annals of Statistics 30, 74-99). The purpose of this article is to implement them practically into the model building process when analyzing high-dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou (2006). Journal of the American Statistical Association 101, 1418-1429). We compare them with "standard" applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research.
Similar articles
-
L1 penalized estimation in the Cox proportional hazards model.Biom J. 2010 Feb;52(1):70-84. doi: 10.1002/bimj.200900028. Biom J. 2010. PMID: 19937997
-
Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.Bioinformatics. 2005 Jul 1;21(13):3001-8. doi: 10.1093/bioinformatics/bti422. Epub 2005 Apr 6. Bioinformatics. 2005. PMID: 15814556
-
Gradient lasso for Cox proportional hazards model.Bioinformatics. 2009 Jul 15;25(14):1775-81. doi: 10.1093/bioinformatics/btp322. Epub 2009 May 15. Bioinformatics. 2009. PMID: 19447787
-
Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models.Stat Med. 2016 Jul 10;35(15):2561-73. doi: 10.1002/sim.6927. Epub 2016 Mar 10. Stat Med. 2016. PMID: 26970107 Review.
-
Time-dependent covariates in the Cox proportional-hazards regression model.Annu Rev Public Health. 1999;20:145-57. doi: 10.1146/annurev.publhealth.20.1.145. Annu Rev Public Health. 1999. PMID: 10352854 Review.
Cited by
-
Extreme learning machine Cox model for high-dimensional survival analysis.Stat Med. 2019 May 30;38(12):2139-2156. doi: 10.1002/sim.8090. Epub 2019 Jan 10. Stat Med. 2019. PMID: 30632193 Free PMC article.
-
Identifying miRNA-mRNA Integration Set Associated With Survival Time.Front Genet. 2021 Jun 29;12:634922. doi: 10.3389/fgene.2021.634922. eCollection 2021. Front Genet. 2021. PMID: 34267778 Free PMC article.
-
Survival analysis by penalized regression and matrix factorization.ScientificWorldJournal. 2013 Apr 23;2013:632030. doi: 10.1155/2013/632030. Print 2013. ScientificWorldJournal. 2013. PMID: 23737722 Free PMC article.
-
A new statistical method for curve group analysis of longitudinal gene expression data illustrated for breast cancer in the NOWAC postgenome cohort as a proof of principle.BMC Med Res Methodol. 2016 Mar 5;16:28. doi: 10.1186/s12874-016-0129-z. BMC Med Res Methodol. 2016. PMID: 26944545 Free PMC article.
-
An immune risk score predicts progression-free survival of melanoma patients in South China receiving anti-PD-1 inhibitor therapy-a retrospective cohort study examining 66 circulating immune cell subsets.Front Immunol. 2022 Dec 7;13:1012673. doi: 10.3389/fimmu.2022.1012673. eCollection 2022. Front Immunol. 2022. PMID: 36569825 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources