On the use of multiple imputation to address data missing by design as well as unintended missing data in case-cohort studies with a binary endpoint
- PMID: 38062377
- PMCID: PMC10702035
- DOI: 10.1186/s12874-023-02090-5
On the use of multiple imputation to address data missing by design as well as unintended missing data in case-cohort studies with a binary endpoint
Abstract
Background: Case-cohort studies are conducted within cohort studies, with the defining feature that collection of exposure data is limited to a subset of the cohort, leading to a large proportion of missing data by design. Standard analysis uses inverse probability weighting (IPW) to address this intended missing data, but little research has been conducted into how best to perform analysis when there is also unintended missingness. Multiple imputation (MI) has become a default standard for handling unintended missingness and is typically used in combination with IPW to handle the intended missingness due to the case-control sampling. Alternatively, MI could be used to handle both the intended and unintended missingness. While the performance of an MI-only approach has been investigated in the context of a case-cohort study with a time-to-event outcome, it is unclear how this approach performs with a binary outcome.
Methods: We conducted a simulation study to assess and compare the performance of approaches using only MI, only IPW, and a combination of MI and IPW, for handling intended and unintended missingness in the case-cohort setting. We also applied the approaches to a case study.
Results: Our results show that the combined approach is approximately unbiased for estimation of the exposure effect when the sample size is large, and was the least biased with small sample sizes, while MI-only and IPW-only exhibited larger biases in both sample size settings.
Conclusions: These findings suggest that a combined MI/IPW approach should be preferred to handle intended and unintended missing data in case-cohort studies with binary outcomes.
Keywords: Case-cohort study; Inverse probability weighting; Missing data; Multiple imputation; Simulation study.
© 2023. Crown.
Conflict of interest statement
The authors declare no competing interests.
Figures





Similar articles
-
Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome.BMC Med Res Methodol. 2022 Apr 3;22(1):87. doi: 10.1186/s12874-021-01495-4. BMC Med Res Methodol. 2022. PMID: 35369860 Free PMC article.
-
Evaluation of predictive model performance of an existing model in the presence of missing data.Stat Med. 2021 Jul 10;40(15):3477-3498. doi: 10.1002/sim.8978. Epub 2021 Apr 11. Stat Med. 2021. PMID: 33843085 Free PMC article.
-
Missing Data in Marginal Structural Models: A Plasmode Simulation Study Comparing Multiple Imputation and Inverse Probability Weighting.Med Care. 2019 Mar;57(3):237-243. doi: 10.1097/MLR.0000000000001063. Med Care. 2019. PMID: 30664611 Free PMC article.
-
Review of inverse probability weighting for dealing with missing data.Stat Methods Med Res. 2013 Jun;22(3):278-95. doi: 10.1177/0962280210395740. Epub 2011 Jan 10. Stat Methods Med Res. 2013. PMID: 21220355 Review.
-
Imputation of missing covariate in randomized controlled trials with a continuous outcome: Scoping review and new results.Pharm Stat. 2020 Nov;19(6):840-860. doi: 10.1002/pst.2041. Epub 2020 Jun 8. Pharm Stat. 2020. PMID: 32510791 Free PMC article.
References
-
- Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73(1):1–11. doi: 10.1093/biomet/73.1.1. - DOI
-
- Lumley T. Complex surveys: a guide to analysis using R. 1. Hoboken, NJ: Wiley; 2010.
-
- Rubin DB: Multiple imputation for nonresponse in surveys, 1st edn. New York: Wiley; 1987.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources