Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 4;5(1):e40.
doi: 10.1017/cts.2020.533.

Comparison of regression imputation methods of baseline covariates that predict survival outcomes

Affiliations

Comparison of regression imputation methods of baseline covariates that predict survival outcomes

Nicole Solomon et al. J Clin Transl Sci. .

Abstract

Introduction: Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typically biased output of complete case analysis. This article examines several types of regression imputation of missing covariates in the prediction of time-to-event outcomes subject to right censoring.

Methods: We evaluated the performance of five regression methods in the imputation of missing covariates for the proportional hazards model via summary statistics, including proportional bias and proportional mean squared error. The primary objective was to determine which among the parametric generalized linear models (GLMs) and least absolute shrinkage and selection operator (LASSO), and nonparametric multivariate adaptive regression splines (MARS), support vector machine (SVM), and random forest (RF), provides the "best" imputation model for baseline missing covariates in predicting a survival outcome.

Results: LASSO on an average observed the smallest bias, mean square error, mean square prediction error, and median absolute deviation (MAD) of the final analysis model's parameters among all five methods considered. SVM performed the second best while GLM and MARS exhibited the lowest relative performances.

Conclusion: LASSO and SVM outperform GLM, MARS, and RF in the context of regression imputation for prediction of a time-to-event outcome.

Keywords: Missing data; proportional hazards model; regression imputation.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest to declare.

Figures

Fig. 1.
Fig. 1.
Performance of regression imputation methods for each summary statistic in simulations where C = 30% and M = 15%.
Fig. 2.
Fig. 2.
Performance of regression imputation methods for each summary statistic in simulations where C = 30% and M = 10%.

References

    1. Smith CJ. Missing data. Phlebology 2011; 26: 215–216. - PubMed
    1. Brick JM, Kalton G. Handling missing data in survey research. Statistical Methods in Medical Research 1996; 5: 215–238. - PubMed
    1. Rubin DB. Inference and missing data. Biometrika 1976; 63: 581–592.
    1. Little RJA. Regression with missing x’s: a review. Journal of the American Statistical Association 1992; 87: 1227–1237.
    1. Acock AC. Working with missing values. Journal of Marriage and Family 2005; 67: 1012–1028.

LinkOut - more resources