. 2016 Oct;17(4):692-707.

doi: 10.1093/biostatistics/kxw016. Epub 2016 Apr 22.

Inference for survival prediction under the regularized Cox model

Jennifer A Sinnott¹, Tianxi Cai²

Affiliations

¹ Department of Statistics, The Ohio State University, Columbus, OH 43210, USA jsinnott@stat.osu.edu.
² Department of Biostatistics, Harvard University, Boston, MA 02115, USA.

PMID: 27107008
PMCID: PMC5031946
DOI: 10.1093/biostatistics/kxw016

Inference for survival prediction under the regularized Cox model

Jennifer A Sinnott et al. Biostatistics. 2016 Oct.

. 2016 Oct;17(4):692-707.

doi: 10.1093/biostatistics/kxw016. Epub 2016 Apr 22.

Authors

Jennifer A Sinnott¹, Tianxi Cai²

Affiliations

¹ Department of Statistics, The Ohio State University, Columbus, OH 43210, USA jsinnott@stat.osu.edu.
² Department of Biostatistics, Harvard University, Boston, MA 02115, USA.

PMID: 27107008
PMCID: PMC5031946
DOI: 10.1093/biostatistics/kxw016

Abstract

When a moderate number of potential predictors are available and a survival model is fit with regularization to achieve variable selection, providing accurate inference on the predicted survival can be challenging. We investigate inference on the predicted survival estimated after fitting a Cox model under regularization guaranteeing the oracle property. We demonstrate that existing asymptotic formulas for the standard errors of the coefficients tend to underestimate the variability for some coefficients, while typical resampling such as the bootstrap tends to overestimate it; these approaches can both lead to inaccurate variance estimation for predicted survival functions. We propose a two-stage adaptation of a resampling approach that brings the estimated error in line with the truth. In stage 1, we estimate the coefficients in the observed data set and in [Formula: see text] resampled data sets, and allow the resampled coefficient estimates to vote on whether each coefficient should be 0. For those coefficients voted as zero, we set both the point and interval estimates to [Formula: see text] In stage 2, to make inference about coefficients not voted as zero in stage 1, we refit the penalized model in the observed data and in the [Formula: see text] resampled data sets with only variables corresponding to those coefficients. We demonstrate that ensemble voting-based point and interval estimators of the coefficients perform well in finite samples, and prove that the point estimator maintains the oracle property. We extend this approach to derive inference procedures for survival functions and demonstrate that our proposed interval estimation procedures substantially outperform estimators based on asymptotic inference or standard bootstrap. We further illustrate our proposed procedures to predict breast cancer survival in a gene expression study.

Keywords: Bootstrap; Ensemble methods; Oracle property; Proportional hazards model; Regularized estimation; Resampling; Risk prediction; Simultaneous confidence intervals; Survival functions.

PubMed Disclaimer

Figures

**Fig. 1.**
Comparison of the SEs, bias, and 95% CI coverage of for true model parameters Shown are values when (with absolute bias displayed), as well as and Bias and empirical SEs are compared for the base aENET fit () and the aENET fit after the voting procedure (); the variability for the base aENET fit may be estimated using either the bootstrap or the asymptotic method, while the variability for the voting procedure is estimated using the resampled coefficient estimators after voting.

formula image — **Fig. 1.**
Comparison of the SEs, bias, and 95% CI coverage of for true model parameters Shown are values when (with absolute bias displayed), as well as and Bias and empirical SEs are compared for the base aENET fit () and the aENET fit after the voting procedure (); the variability for the base aENET fit may be estimated using either the bootstrap or the asymptotic method, while the variability for the voting procedure is estimated using the resampled coefficient estimators after voting.

**Fig. 2.**
Under the model with , CI coverage for -year survival, and width, for three covariate levels: , with all covariates 0; ; and .

**Fig. 3.**
Under the model with , simultaneous CI coverage for , with all covariates 0; ; and . Also shown are simultaneous confidence widths at representative times.

**Fig. 4.**
In the breast cancer study, estimates of the (unpenalized) coefficient for ER status and the (penalized) coefficients for the variables in the p53 signaling pathway, each with 95% CIs, estimated using the aENET estimate with bootstrap CIs, and the voting-based method for both point estimation and interval estimation.

**Fig. 5.**
Pointwise (left-hand column) and simultaneous (right-hand column) CIs for two individuals in the data set (top row: ID S246; bottom row: ID S034). The thin dotted line is the predicted survival from the aENET ; the thin dashed line is the predicted survival from voting-based estimate . Thick dotted lines are the bootstrap-based confidence limits around the aENET predicted survival, and thick dashed lines are the voting-based confidence limits around the voting-based predicted survival.

See this image and copyright information in PMC

References

1. Bach F. R. (2008). Bolasso: model consistent lasso estimation through the bootstrap. In: Proceedings of the 25th International Conference on Machine Learning New York: ACM, pp. 33–40.
1. Breslow N. E. (1972). Contribution to the discussion of the paper by DR Cox. Journal of the Royal Statistical Society, Series B 342, 216–217.
1. Cox D. R. (1972). Regression models and life tables. Journal of the Royal Statistical Society. Series B 34, 187–220.
1. Economopoulou P., Dimitriadis G., Psyrri A. (2015). Beyond brca: new hereditary breast cancer susceptibility genes. Cancer Treatment Reviews 411, 1–8. - PubMed
1. Fan J., Li R. (2002). Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics 301, 74–99.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inference for survival prediction under the regularized Cox model

Affiliations

Inference for survival prediction under the regularized Cox model

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources