Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jun 6:9:269.
doi: 10.1186/1471-2105-9-269.

Flexible boosting of accelerated failure time models

Affiliations

Flexible boosting of accelerated failure time models

Matthias Schmid et al. BMC Bioinformatics. .

Abstract

Background: When boosting algorithms are used for building survival models from high-dimensional data, it is common to fit a Cox proportional hazards model or to use least squares techniques for fitting semiparametric accelerated failure time models. There are cases, however, where fitting a fully parametric accelerated failure time model is a good alternative to these methods, especially when the proportional hazards assumption is not justified. Boosting algorithms for the estimation of parametric accelerated failure time models have not been developed so far, since these models require the estimation of a model-specific scale parameter which traditional boosting algorithms are not able to deal with.

Results: We introduce a new boosting algorithm for censored time-to-event data which is suitable for fitting parametric accelerated failure time models. Estimation of the predictor function is carried out simultaneously with the estimation of the scale parameter, so that the negative log likelihood of the survival distribution can be used as a loss function for the boosting algorithm. The estimation of the scale parameter does not affect the favorable properties of boosting with respect to variable selection.

Conclusion: The analysis of a high-dimensional set of microarray data demonstrates that the new algorithm is able to outperform boosting with the Cox partial likelihood when the proportional hazards assumption is questionable. In low-dimensional settings, i.e., when classical likelihood estimation of a parametric accelerated failure time model is possible, simulations show that the new boosting algorithm closely approximates the estimates obtained from the maximum likelihood method.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Cox-Snell residuals obtained from fitting various survival models to the Barrier data. The upper left panel shows the Cox-Snell residuals of a semiparametric Cox model vs. the Nelson-Aalen estimate of their cumulative hazard function. Estimates were obtained from fitting a Cox proportional hazards model to the Barrier data via maximization of the partial log likelihood. The 14 most differentially expressed genes between the disease and the disease-free group were used as predictor variables. The other panels show the Cox-Snell residuals (together with their cumulative hazard function) obtained from fitting various parametric AFT models to the same data via maximum likelihood estimation. Obviously, the lines corresponding to the Cox-Snell residuals of the log-logistic and lognormal models are closest to the line through the origin, indicating that these models fit the data best. By contrast, the Cox model and the Weibull model (which both assume proportional hazards) do not seem to fit the data well, indicating that the proportional hazards assumption is violated.
Figure 2
Figure 2
Estimated log cumulative hazard functions obtained from fitting a stratified Cox model to the Barrier data. Estimates were obtained via maximization of the stratified partial log likelihood. The strata were generated by splitting the expression values of the most overexpressed gene in the disease group (202500_at) at their median. The remaining 13 of the 14 most differentially expressed genes were used as predictor variables in the stratified Cox model.
Figure 3
Figure 3
Boxplots of the Weibull parameter estimates when 5 informative covariates are present. Boxplots of the estimates of β = (0.5, 0.25, -0.25, -0.5, 0.5), as obtained from the 50 Weibull-distributed samples following Model (10). Grey boxplots correspond to boosting estimates, white boxplots correspond to maximum likelihood estimates. Similar results were obtained for the log-logistic and lognormal models.
Figure 4
Figure 4
Boxplots of the Weibull parameter estimates when 5 informative and 15 additional non-informative covariates are present. Boxplots of the estimates of β1,...,β20, as obtained from the 50 Weibull-distributed samples following Model (10). Grey boxplots correspond to boosting estimates, white boxplots correspond to maximum likelihood estimates. Similar results were obtained for the log-logistic and lognormal models.
Figure 5
Figure 5
Boxplots of the predictive Weibull log likelihood estimates. Boxplots of the predictive Weibull log likelihood estimates, as obtained from the 50 Weibull-distributed test samples following Model (10). The predictive log likelihood values of the null model were obtained via maximum likelihood estimation with no covariates and an intercept only. Similar results were obtained for the log-logistic and lognormal models.
Figure 6
Figure 6
Analysis of the Barrier stage II colon cancer data – prediction error curves for various parametric AFT models. Prediction error curves obtained from boosting with the negative log-logistic log likelihood, boosting with the negative Weibull log likelihood, and boosting with the negative lognormal log likelihood.
Figure 7
Figure 7
Analysis of the Barrier stage II colon cancer data – prediction error curves for parametric and semiparametric AFT models. Prediction error curves obtained from boosting with the negative log-logistic log likelihood, L2Boosting for semiparametric AFT models, and L1 penalized estimation for semiparametric AFT models (Lasso).
Figure 8
Figure 8
Analysis of the Barrier stage II colon cancer data – prediction error curves for various survival models. Prediction error curves obtained from boosting with the negative log-logistic log likelihood, boosting with the negative Cox partial log likelihood, L1 penalized estimation of a Cox proportional hazards model (CoxPath), and nonparametric estimation via the Kaplan-Meier estimator.
Figure 9
Figure 9
Analysis of the Barrier stage II colon cancer data – Cox-Snell residuals for various boosting methods. The upper left panel shows the Cox-Snell residuals of a semiparametric Cox model vs. the Nelson-Aalen estimate of their cumulative hazard function. Estimates were obtained from boosting with the negative Cox partial log likelihood. The other panels show the Cox-Snell residuals (together with their cumulative hazard function) obtained from fitting various parametric AFT models to the same data via boosting with the corresponding negative log likelihood loss. Similar to Fig. 1, we see that the line corresponding to the Cox-Snell residuals of the log-logistic model is close to the line through the origin. The Cox model does not seem to fit the data well, indicating that the proportional hazards assumption is violated.

Similar articles

Cited by

References

    1. Cox DR. Regression Models and Life Tables (with Discussion) Journal of the Royal Statistical Society, Series B. 1972;34:187–220.
    1. James I. Accelerated Failure-Time Models. In: Armitage P, Colton T, editor. Encyclopedia of Biostatistics. John Wiley & Sons, Chichester; 1998. pp. 26–30.
    1. Tibshirani R. The Lasso Method for Variable Selection in the Cox Model. Statistics in Medicine. 1997;16:385–395. doi: 10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3. - DOI - PubMed
    1. Gui J, Li H. Penalized Cox Regression Analysis in the High-Dimensional and Low-Sample Size Settings, with Applications to Microarray Gene Expression Data. Bioinformatics. 2005;21:3001–3008. doi: 10.1093/bioinformatics/bti422. - DOI - PubMed
    1. Park MY, Hastie T. L1-Regularization Path Algorithm for Generalized Linear Models. Journal of the Royal Statistical Society, Series B. 2007;69:659–677. doi: 10.1111/j.1467-9868.2007.00607.x. - DOI

Publication types

LinkOut - more resources