Inference for survival prediction under the regularized Cox model
- PMID: 27107008
- PMCID: PMC5031946
- DOI: 10.1093/biostatistics/kxw016
Inference for survival prediction under the regularized Cox model
Abstract
When a moderate number of potential predictors are available and a survival model is fit with regularization to achieve variable selection, providing accurate inference on the predicted survival can be challenging. We investigate inference on the predicted survival estimated after fitting a Cox model under regularization guaranteeing the oracle property. We demonstrate that existing asymptotic formulas for the standard errors of the coefficients tend to underestimate the variability for some coefficients, while typical resampling such as the bootstrap tends to overestimate it; these approaches can both lead to inaccurate variance estimation for predicted survival functions. We propose a two-stage adaptation of a resampling approach that brings the estimated error in line with the truth. In stage 1, we estimate the coefficients in the observed data set and in [Formula: see text] resampled data sets, and allow the resampled coefficient estimates to vote on whether each coefficient should be 0. For those coefficients voted as zero, we set both the point and interval estimates to [Formula: see text] In stage 2, to make inference about coefficients not voted as zero in stage 1, we refit the penalized model in the observed data and in the [Formula: see text] resampled data sets with only variables corresponding to those coefficients. We demonstrate that ensemble voting-based point and interval estimators of the coefficients perform well in finite samples, and prove that the point estimator maintains the oracle property. We extend this approach to derive inference procedures for survival functions and demonstrate that our proposed interval estimation procedures substantially outperform estimators based on asymptotic inference or standard bootstrap. We further illustrate our proposed procedures to predict breast cancer survival in a gene expression study.
Keywords: Bootstrap; Ensemble methods; Oracle property; Proportional hazards model; Regularized estimation; Resampling; Risk prediction; Simultaneous confidence intervals; Survival functions.
© The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Figures





References
-
- Bach F. R. (2008). Bolasso: model consistent lasso estimation through the bootstrap. In: Proceedings of the 25th International Conference on Machine Learning New York: ACM, pp. 33–40.
-
- Breslow N. E. (1972). Contribution to the discussion of the paper by DR Cox. Journal of the Royal Statistical Society, Series B 342, 216–217.
-
- Cox D. R. (1972). Regression models and life tables. Journal of the Royal Statistical Society. Series B 34, 187–220.
-
- Economopoulou P., Dimitriadis G., Psyrri A. (2015). Beyond brca: new hereditary breast cancer susceptibility genes. Cancer Treatment Reviews 411, 1–8. - PubMed
-
- Fan J., Li R. (2002). Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics 301, 74–99.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources