A systematic approach towards missing lab data in electronic health records: A case study in non-small cell lung cancer and multiple myeloma
- PMID: 37322818
- PMCID: PMC10508534
- DOI: 10.1002/psp4.12998
A systematic approach towards missing lab data in electronic health records: A case study in non-small cell lung cancer and multiple myeloma
Abstract
Real-world data derived from electronic health records often exhibit high levels of missingness in variables, such as laboratory results, presenting a challenge for statistical analyses. We developed a systematic workflow for gathering evidence of different missingness mechanisms and performing subsequent statistical analyses. We quantify evidence for missing completely at random (MCAR) or missing at random (MAR), mechanisms using Hotelling's multivariate t-test, and random forest classifiers, respectively. We further illustrate how to apply sensitivity analyses using the not at random fully conditional specification procedure to examine changes in parameter estimates under missing not at random (MNAR) mechanisms. In simulation studies, we validated these diagnostics and compared analytic bias under different mechanisms. To demonstrate the application of this workflow, we applied it to two exemplary case studies with an advanced non-small cell lung cancer and a multiple myeloma cohort derived from a real-world oncology database. Here, we found strong evidence against MCAR, and some evidence of MAR, implying that imputation approaches that attempt to predict missing values by fitting a model to observed data may be suitable for use. Sensitivity analyses did not suggest meaningful departures of our analytic results under potential MNAR mechanisms; these results were also in line with results reported in clinical trials.
© 2023 Flatiron Health and The Authors. CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals LLC on behalf of American Society for Clinical Pharmacology and Therapeutics.
Conflict of interest statement
A.S., P.Y., C.J., M.S., and S.C. all report employment in Flatiron Health Inc., which is an independent subsidiary of the Roche Group, and stock ownership in Roche. J.W. reports employment at Hoffmann‐La Roche, and stock ownership in Roche. M.T. reports employment at Genentech, a Member of the Roche Group, and stock ownership in Roche.
Figures





Similar articles
-
A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records.Clin Epidemiol. 2024 May 21;16:329-343. doi: 10.2147/CLEP.S436131. eCollection 2024. Clin Epidemiol. 2024. PMID: 38798915 Free PMC article.
-
Approaches for missing covariate data in logistic regression with MNAR sensitivity analyses.Biom J. 2020 Jul;62(4):1025-1037. doi: 10.1002/bimj.201900117. Epub 2020 Jan 20. Biom J. 2020. PMID: 31957905
-
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study.BMC Med Res Methodol. 2010 Jan 19;10:7. doi: 10.1186/1471-2288-10-7. BMC Med Res Methodol. 2010. PMID: 20085642 Free PMC article.
-
Data Missing Not at Random in Mobile Health Research: Assessment of the Problem and a Case for Sensitivity Analyses.J Med Internet Res. 2021 Jun 15;23(6):e26749. doi: 10.2196/26749. J Med Internet Res. 2021. PMID: 34128810 Free PMC article.
-
Identifying the types of missingness in quality of life data from clinical trials.Stat Med. 1998 Mar 15-Apr 15;17(5-7):739-56. doi: 10.1002/(sici)1097-0258(19980315/15)17:5/7<739::aid-sim818>3.0.co;2-m. Stat Med. 1998. PMID: 9549820 Review.
Cited by
-
smdi: an R package to perform structural missing data investigations on partially observed confounders in real-world evidence studies.JAMIA Open. 2024 Jan 31;7(1):ooae008. doi: 10.1093/jamiaopen/ooae008. eCollection 2024 Apr. JAMIA Open. 2024. PMID: 38304248 Free PMC article.
-
Exploring the Feasibility of a Bracketing Approach Utilizing Modeling for Development of Long-Acting Injectables for Regulatory Approval-A Case Study Using Levonorgestrel.Pharmaceuticals (Basel). 2024 Dec 6;17(12):1640. doi: 10.3390/ph17121640. Pharmaceuticals (Basel). 2024. PMID: 39770482 Free PMC article.
-
A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records.Clin Epidemiol. 2024 May 21;16:329-343. doi: 10.2147/CLEP.S436131. eCollection 2024. Clin Epidemiol. 2024. PMID: 38798915 Free PMC article.
-
Comparative study of imputation strategies to improve the sarcopenia prediction task.Digit Health. 2025 Jan 17;11:20552076241301960. doi: 10.1177/20552076241301960. eCollection 2025 Jan-Dec. Digit Health. 2025. PMID: 39839962 Free PMC article.
References
-
- Becker T, Weberpals J, Jegg AM, et al. An enhanced prognostic score for overall survival of patients with cancer derived from a large real‐world cohort. Ann Oncol. 2020;31:1561‐1568. - PubMed
-
- Sv B. Flexible imputation of missing data. 2nd ed. CRC Press; 2018.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous