How are missing data in covariates handled in observational time-to-event studies in oncology? A systematic review
- PMID: 32471366
- PMCID: PMC7260743
- DOI: 10.1186/s12874-020-01018-7
How are missing data in covariates handled in observational time-to-event studies in oncology? A systematic review
Abstract
Background: Missing data in covariates can result in biased estimates and loss of power to detect associations. It can also lead to other challenges in time-to-event analyses including the handling of time-varying effects of covariates, selection of covariates and their flexible modelling. This review aims to describe how researchers approach time-to-event analyses with missing data.
Methods: Medline and Embase were searched for observational time-to-event studies in oncology published from January 2012 to January 2018. The review focused on proportional hazards models or extended Cox models. We investigated the extent and reporting of missing data and how it was addressed in the analysis. Covariate modelling and selection, and assessment of the proportional hazards assumption were also investigated, alongside the treatment of missing data in these procedures.
Results: 148 studies were included. The mean proportion of individuals with missingness in any covariate was 32%. 53% of studies used complete-case analysis, and 22% used multiple imputation. In total, 14% of studies stated an assumption concerning missing data and only 34% stated missingness as a limitation. The proportional hazards assumption was checked in 28% of studies, of which, 17% did not state the assessment method. 58% of 144 multivariable models stated their covariate selection procedure with use of a pre-selected set of covariates being the most popular followed by stepwise methods and univariable analyses. Of 69 studies that included continuous covariates, 81% did not assess the appropriateness of the functional form.
Conclusion: While guidelines for handling missing data in epidemiological studies are in place, this review indicates that few report implementing recommendations in practice. Although missing data are present in many studies, we found that few state clearly how they handled it or the assumptions they have made. Easy-to-implement but potentially biased approaches such as complete-case analysis are most commonly used despite these relying on strong assumptions and where often more appropriate methods should be employed. Authors should be encouraged to follow existing guidelines to address missing data, and increased levels of expectation from journals and editors could be used to improve practice.
Keywords: Epidemiology; Missing data; Multiple imputation; Observational studies; Oncology; Survival; Time-to-event.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures



Similar articles
-
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study.BMC Med Res Methodol. 2010 Dec 31;10:112. doi: 10.1186/1471-2288-10-112. BMC Med Res Methodol. 2010. PMID: 21194416 Free PMC article.
-
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7. BMC Med Res Methodol. 2010. PMID: 20085642 Free PMC article.
-
Multiple imputation in Cox regression when there are time-varying effects of covariates.Stat Med. 2018 Nov 10;37(25):3661-3678. doi: 10.1002/sim.7842. Epub 2018 Jul 16. Stat Med. 2018. PMID: 30014575 Free PMC article.
-
Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions.BMC Med Res Methodol. 2024 Sep 4;24(1):193. doi: 10.1186/s12874-024-02302-6. BMC Med Res Methodol. 2024. PMID: 39232661 Free PMC article.
-
A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures.BMC Med Res Methodol. 2012 Jul 11;12:96. doi: 10.1186/1471-2288-12-96. BMC Med Res Methodol. 2012. PMID: 22784200 Free PMC article. Review.
Cited by
-
Intake of polyphenols from cereal foods and colorectal cancer risk in the Melbourne Collaborative Cohort Study.Cancer Med. 2023 Sep;12(18):19188-19202. doi: 10.1002/cam4.6514. Epub 2023 Sep 13. Cancer Med. 2023. PMID: 37702114 Free PMC article.
-
Assessing proximate intermediates between ambient temperature, hospital admissions, and mortality in hemodialysis patients.Environ Res. 2022 Mar;204(Pt B):112127. doi: 10.1016/j.envres.2021.112127. Epub 2021 Sep 25. Environ Res. 2022. PMID: 34582801 Free PMC article.
-
The reporting and handling of missing data in longitudinal studies of older adults is suboptimal: a methodological survey of geriatric journals.BMC Med Res Methodol. 2022 Apr 26;22(1):122. doi: 10.1186/s12874-022-01605-w. BMC Med Res Methodol. 2022. PMID: 35473665 Free PMC article. Review.
-
Precise Estimation for the Age of Initiation of Tobacco Use Among U.S. Youth: Finding from the Population Assessment of Tobacco and Health (PATH) Study, 2013-2017.Biostat Biom Open Access J. 2022 Oct;11(1):555801. doi: 10.19080/bboaj.2022.11.555801. Epub 2022 Oct 21. Biostat Biom Open Access J. 2022. PMID: 36777448 Free PMC article.
-
A systematic approach towards missing lab data in electronic health records: A case study in non-small cell lung cancer and multiple myeloma.CPT Pharmacometrics Syst Pharmacol. 2023 Sep;12(9):1201-1212. doi: 10.1002/psp4.12998. Epub 2023 Jun 15. CPT Pharmacometrics Syst Pharmacol. 2023. PMID: 37322818 Free PMC article.
References
-
- Rubin DB. Multiple Imputation for Nonresponse in Surveys. United States of America: Wiley; 1987.
-
- Little RJA, Rubin DB. Statistical Analysis with Missing Data, 2nd edn. United States of America: Wiley; 2002.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources