Cox regression analysis with missing covariates via nonparametric multiple imputation
- PMID: 29717943
- PMCID: PMC6291381
- DOI: 10.1177/0962280218772592
Cox regression analysis with missing covariates via nonparametric multiple imputation
Abstract
We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
Keywords: Augmented inverse probability weighted method; Cox regression; missing covariates; multiple imputation; predictive mean matching.
Conflict of interest statement
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Similar articles
-
Analysis of accelerated failure time data with dependent censoring using auxiliary variables via nonparametric multiple imputation.Stat Med. 2015 Aug 30;34(19):2768-80. doi: 10.1002/sim.6534. Epub 2015 May 21. Stat Med. 2015. PMID: 25999295 Free PMC article.
-
A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data.J Biopharm Stat. 2014;24(3):634-48. doi: 10.1080/10543406.2014.888444. J Biopharm Stat. 2014. PMID: 24697618 Free PMC article.
-
A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates.Stat Med. 2010 Nov 10;29(25):2592-604. doi: 10.1002/sim.4016. Stat Med. 2010. PMID: 20806403 Free PMC article.
-
How are missing data in covariates handled in observational time-to-event studies in oncology? A systematic review.BMC Med Res Methodol. 2020 May 29;20(1):134. doi: 10.1186/s12874-020-01018-7. BMC Med Res Methodol. 2020. PMID: 32471366 Free PMC article.
-
A critical look at methods for handling missing covariates in epidemiologic regression analyses.Am J Epidemiol. 1995 Dec 15;142(12):1255-64. doi: 10.1093/oxfordjournals.aje.a117592. Am J Epidemiol. 1995. PMID: 7503045 Review.
Cited by
-
Regression analysis of interval-censored failure time data under semiparametric transformation models with missing covariates.Int J Biostat. 2025 Aug 29. doi: 10.1515/ijb-2024-0016. Online ahead of print. Int J Biostat. 2025. PMID: 40879284
-
Regularized Buckley-James method for right-censored outcomes with block-missing multimodal covariates.Stat (Int Stat Inst). 2022 Dec;11(1):e515. doi: 10.1002/sta4.515. Epub 2022 Oct 13. Stat (Int Stat Inst). 2022. PMID: 37854542 Free PMC article.
-
Association between Serum Triglycerides and Prostate Specific Antigen (PSA) among U.S. Males: National Health and Nutrition Examination Survey (NHANES), 2003-2010.Nutrients. 2022 Mar 22;14(7):1325. doi: 10.3390/nu14071325. Nutrients. 2022. PMID: 35405939 Free PMC article.
-
A Hybrid Approach for the Stratified Mark-Specific Proportional Hazards Model with Missing Covariates and Missing Marks, with Application to Vaccine Efficacy Trials.J R Stat Soc Ser C Appl Stat. 2020 Aug;69(4):791-814. doi: 10.1111/rssc.12417. Epub 2020 May 22. J R Stat Soc Ser C Appl Stat. 2020. PMID: 33191955 Free PMC article.
-
Assessing volatile organic compounds exposure and prostate-specific antigen: National Health and Nutrition Examination Survey, 2001-2010.Front Public Health. 2022 Jul 29;10:957069. doi: 10.3389/fpubh.2022.957069. eCollection 2022. Front Public Health. 2022. PMID: 35968491 Free PMC article.
References
-
- Cox DR. Regression models and life-tables. J Royal Stat Soc Ser B (Methodological) 1972; 34: 187–220.
-
- Cox DR. Partial likelihood. Biometrika 1975; 62: 269–276.
-
- Andersen PK and Gill RD. Cox’s regression model for counting processes: a large sample study. Ann Stat 1982; 10: 1100–1120.
-
- Little RJA and Rubin DB. Statistical analysis with missing data, 2nd ed. New York, NY: Wiley, 2002.
-
- Rubin DB. Multiple imputation for nonresponse in surveys New York, NY: Wiley, 1987.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical