Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry

Jonathan Steif¹, Rollin Brant^{1

2}, Rama Syamala Sreepada^{2

3}, Nicholas West², Srinivas Murthy^{2

4}, Matthias Görges^{2

3}

Affiliations

¹ Department of Statistics, University of British Columbia, Vancouver, BC, Canada.
² Research Institute, BC Children's Hospital, Vancouver, BC, Canada.
³ Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, BC, Canada.
⁴ Department of Pediatrics, Division of Critical Care, University of British Columbia, Vancouver, BC, Canada.

PMID: 34560774
PMCID: PMC8719509
DOI: 10.1097/PCC.0000000000002835

Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry

Jonathan Steif et al. Pediatr Crit Care Med. 2022.

. 2022 Jan 1;23(1):e29-e44.

doi: 10.1097/PCC.0000000000002835.

Authors

Jonathan Steif¹, Rollin Brant^{1

2}, Rama Syamala Sreepada^{2

3}, Nicholas West², Srinivas Murthy^{2

4}, Matthias Görges^{2

3}

Affiliations

¹ Department of Statistics, University of British Columbia, Vancouver, BC, Canada.
² Research Institute, BC Children's Hospital, Vancouver, BC, Canada.
³ Department of Anesthesiology, Pharmacology & Therapeutics, University of British Columbia, Vancouver, BC, Canada.
⁴ Department of Pediatrics, Division of Critical Care, University of British Columbia, Vancouver, BC, Canada.

PMID: 34560774
PMCID: PMC8719509
DOI: 10.1097/PCC.0000000000002835

Abstract

Objectives: To evaluate the performance of pragmatic imputation approaches when estimating model coefficients using datasets with varying degrees of data missingness.

Design: Performance in predicting observed mortality in a registry dataset was evaluated using simulations of two simple logistic regression models with age-specific criteria for abnormal vital signs (mentation, systolic blood pressure, respiratory rate, WBC count, heart rate, and temperature). Starting with a dataset with complete information, increasing degrees of biased missingness of WBC and mentation were introduced, depending on the values of temperature and systolic blood pressure, respectively. Missing data approaches evaluated included analysis of complete cases only, assuming missing data are normal, and multiple imputation by chained equations. Percent bias and root mean square error, in relation to parameter estimates obtained from the original data, were evaluated as performance indicators.

Setting: Data were obtained from the Virtual Pediatric Systems, LLC, database (Los Angeles, CA), which provides clinical markers and outcomes in prospectively collected records from 117 PICUs in the United States and Canada.

Patients: Children admitted to a participating PICU in 2017, for whom all required data were available.

Interventions: None.

Measurements and main results: Simulations demonstrated that multiple imputation by chained equations is an effective strategy and that even a naive implementation of multiple imputation by chained equations significantly outperforms traditional approaches: the root mean square error for model coefficients was lower using multiple imputation by chained equations in 90 of 99 of all simulations (91%) compared with discarding cases with missing data and lower in 97 of 99 (98%) compared with models assuming missing values are in the normal range. Assuming missing data to be abnormal was inferior to all other approaches.

Conclusions: Analyses of large observational studies are likely to encounter the issue of missing data, which are likely not missing at random. Researchers should always consider multiple imputation by chained equations (or similar imputation approaches) when encountering even only small proportions of missing data in their work.

PubMed Disclaimer

Conflict of interest statement

Dr. Steif received funding from the University of British Columbia (UBC). Dr. Brant received support for article research from the British Columbia Children’s Hospital Research Institute. Dr. Sreepada’s institution received funding from Mitacs Postdoctoral Fellowship. Dr. Murthy holds a Research Chair from Health Research Foundation and Innovative Medicines Canada. Dr. Görges’ institution received funding from the Natural Sciences and Engineering Research Council of Canada, the Canadian Institutes of Health Research, the Juvenile Diabetes Research Foundation, and the British Columbia Children’s Hospital Research Institute; he received funding from the Michael Smith Foundation for Health Research; he disclosed he is party to a licensing agreement between UBC and NeuroWave Systems for unrelated work, that he was a director for the Society for Technology in Anesthesia, and a principal investigator and coinvestigator for Canada’s Digital Technology Supercluster projects. Mr. West has disclosed that he does not have any potential conflicts of interest.

Figures

**Figure 1.**
Proportion of missing values for Glasgow Coma Score, used to determine abnormal mentation, and WBC count by unit size. The proportion of abnormal values for each unit is indicated using the color gradient from *purple* (all normal) to *yellow* (all abnormal).

**Figure 2.**
Effect of varying degrees of abnormal temperature (Temp) sample weighting on the coefficients of the pediatric systemic inflammatory response syndrome (pSIRS) model, when 50% of WBC count values are missing for three approaches: multiple imputation by chained equations (MICE), missing as normal, and complete case analysis (missing discarded). A, The performance using percentage bias of the coefficients, while B shows root mean square error (RMSE) for the coefficients. The model coefficients included the intercept, as well as abnormal heart rate (HR), WBC, respiratory rate (RR), and Temp.

**Figure 3.**
Effect of varying degrees of abnormal systolic blood pressure (SBP) sample weighting on the coefficients of the quick Sequential [Sepsis-Related] Organ Failure Assessment (qSOFA) model, when 50% of mentation values are missing for three approaches: multiple imputation by chained equations (MICE), missing as normal, and complete case analysis (missing discarded). A, The performance using percentage bias of the coefficients, while B shows root mean square error (RMSE) for the coefficients. The model coefficients included the intercept, as well as abnormal mentation, respiratory rate (RR), and SBP.

See this image and copyright information in PMC

References

1. Mayhew MB, Petersen BK, Sales AP, et al. . Flexible, cluster-based analysis of the electronic medical record of sepsis with composite mixture models. J Biomed Inform. 2018; 78:33–42 - PMC - PubMed
1. Balamuth F, Weiss SL, Neuman MI, et al. . Pediatric severe sepsis in U.S. children’s hospitals. Pediatr Crit Care Med. 2014; 15:798–805 - PMC - PubMed
1. Peters C, Murthy S, Brant R, et al. . Mortality risk using a pediatric quick sequential (sepsis-related) organ failure assessment varies with vital sign thresholds. Pediatr Crit Care Med. 2018; 19:e394–e402 - PubMed
1. Görges M, Peters C, Murthy S, et al. . External validation of the “quick” pediatric logistic organ dysfunction-2 score using a large North American cohort of critically ill children with suspected infection. Pediatr Crit Care Med. 2018; 19:1114–1119 - PubMed
1. Slater A, Shann F, Pearson G; Paediatric Index of Mortality (PIM) Study Group. PIM2: A revised version of the Paediatric Index of Mortality. Intensive Care Med. 2003; 29:278–285 - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry

Affiliations

Prediction Model Performance With Different Imputation Strategies: A Simulation Study Using a North American ICU Registry

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources