Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct;17(4):692-707.
doi: 10.1093/biostatistics/kxw016. Epub 2016 Apr 22.

Inference for survival prediction under the regularized Cox model

Affiliations

Inference for survival prediction under the regularized Cox model

Jennifer A Sinnott et al. Biostatistics. 2016 Oct.

Abstract

When a moderate number of potential predictors are available and a survival model is fit with regularization to achieve variable selection, providing accurate inference on the predicted survival can be challenging. We investigate inference on the predicted survival estimated after fitting a Cox model under regularization guaranteeing the oracle property. We demonstrate that existing asymptotic formulas for the standard errors of the coefficients tend to underestimate the variability for some coefficients, while typical resampling such as the bootstrap tends to overestimate it; these approaches can both lead to inaccurate variance estimation for predicted survival functions. We propose a two-stage adaptation of a resampling approach that brings the estimated error in line with the truth. In stage 1, we estimate the coefficients in the observed data set and in [Formula: see text] resampled data sets, and allow the resampled coefficient estimates to vote on whether each coefficient should be 0. For those coefficients voted as zero, we set both the point and interval estimates to [Formula: see text] In stage 2, to make inference about coefficients not voted as zero in stage 1, we refit the penalized model in the observed data and in the [Formula: see text] resampled data sets with only variables corresponding to those coefficients. We demonstrate that ensemble voting-based point and interval estimators of the coefficients perform well in finite samples, and prove that the point estimator maintains the oracle property. We extend this approach to derive inference procedures for survival functions and demonstrate that our proposed interval estimation procedures substantially outperform estimators based on asymptotic inference or standard bootstrap. We further illustrate our proposed procedures to predict breast cancer survival in a gene expression study.

Keywords: Bootstrap; Ensemble methods; Oracle property; Proportional hazards model; Regularized estimation; Resampling; Risk prediction; Simultaneous confidence intervals; Survival functions.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Comparison of the SEs, bias, and 95% CI coverage of formula image for true model parameters formula image Shown are values when formula image (with absolute bias displayed), as well as formula image and formula image Bias and empirical SEs are compared for the base aENET fit (formula image) and the aENET fit after the voting procedure (formula image); the variability for the base aENET fit may be estimated using either the bootstrap or the asymptotic method, while the variability for the voting procedure is estimated using the resampled coefficient estimators after voting.
Fig. 2.
Fig. 2.
Under the model with formula image, CI coverage for formula image-year survival, and width, for three covariate levels: formula image, with all covariates 0; formula image; and formula image.
Fig. 3.
Fig. 3.
Under the model with formula image, simultaneous CI coverage for formula image, with all covariates 0; formula image; and formula image. Also shown are simultaneous confidence widths at representative times.
Fig. 4.
Fig. 4.
In the breast cancer study, estimates of the (unpenalized) coefficient for ER status and the (penalized) coefficients for the variables in the p53 signaling pathway, each with 95% CIs, estimated using the aENET estimate with bootstrap CIs, and the voting-based method for both point estimation and interval estimation.
Fig. 5.
Fig. 5.
Pointwise (left-hand column) and simultaneous (right-hand column) CIs for two individuals in the data set (top row: ID S246; bottom row: ID S034). The thin dotted line is the predicted survival from the aENET formula image; the thin dashed line is the predicted survival from voting-based estimate formula image. Thick dotted lines are the bootstrap-based confidence limits around the aENET predicted survival, and thick dashed lines are the voting-based confidence limits around the voting-based predicted survival.

References

    1. Bach F. R. (2008). Bolasso: model consistent lasso estimation through the bootstrap. In: Proceedings of the 25th International Conference on Machine Learning New York: ACM, pp. 33–40.
    1. Breslow N. E. (1972). Contribution to the discussion of the paper by DR Cox. Journal of the Royal Statistical Society, Series B 342, 216–217.
    1. Cox D. R. (1972). Regression models and life tables. Journal of the Royal Statistical Society. Series B 34, 187–220.
    1. Economopoulou P., Dimitriadis G., Psyrri A. (2015). Beyond brca: new hereditary breast cancer susceptibility genes. Cancer Treatment Reviews 411, 1–8. - PubMed
    1. Fan J., Li R. (2002). Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics 301, 74–99.

Publication types

LinkOut - more resources