Predicting hospital admissions, ICU utilization, and prolonged length of stay among febrile pediatric emergency department patients using incomplete and imbalanced electronic health record (EHR) data strategies

Affiliations

¹ Computer Technology Associates, Cardiff, CA, United States.
² Department of Pediatrics, Children's National Hospital, Washington, DC, United States.
³ Department of Pediatrics, Children's National Hospital, Washington, DC, United States; Brown University, Providence, RI, United States.
⁴ Center for Genetic Medicine Research, Children's National Research Institute, Washington, DC, United States.
⁵ Department of Pediatrics, Children's National Hospital, Washington, DC, United States; George Washington University School of Medicine and Health Sciences, Washington, DC, United States.
⁶ Virginia Tech Carilion School of Medicine, Roanoke, VA, United States.
⁷ Department of Pediatrics, Children's National Hospital, Washington, DC, United States; Center for Genetic Medicine Research, Children's National Research Institute, Washington, DC, United States; George Washington University School of Medicine and Health Sciences, Washington, DC, United States. Electronic address: ikoutrouli@childrensnational.org.

PMID: 40203463
DOI: 10.1016/j.ijmedinf.2025.105905

Predicting hospital admissions, ICU utilization, and prolonged length of stay among febrile pediatric emergency department patients using incomplete and imbalanced electronic health record (EHR) data strategies

Tom Velez et al. Int J Med Inform. 2025 Aug.

. 2025 Aug:200:105905.

doi: 10.1016/j.ijmedinf.2025.105905. Epub 2025 Apr 4.

Authors

Affiliations

¹ Computer Technology Associates, Cardiff, CA, United States.
² Department of Pediatrics, Children's National Hospital, Washington, DC, United States.
³ Department of Pediatrics, Children's National Hospital, Washington, DC, United States; Brown University, Providence, RI, United States.
⁴ Center for Genetic Medicine Research, Children's National Research Institute, Washington, DC, United States.
⁵ Department of Pediatrics, Children's National Hospital, Washington, DC, United States; George Washington University School of Medicine and Health Sciences, Washington, DC, United States.
⁶ Virginia Tech Carilion School of Medicine, Roanoke, VA, United States.
⁷ Department of Pediatrics, Children's National Hospital, Washington, DC, United States; Center for Genetic Medicine Research, Children's National Research Institute, Washington, DC, United States; George Washington University School of Medicine and Health Sciences, Washington, DC, United States. Electronic address: ikoutrouli@childrensnational.org.

PMID: 40203463
DOI: 10.1016/j.ijmedinf.2025.105905

Abstract

Objective: Determine the efficacy of commonly used approaches to handling missing and/or imbalanced Electronic Health Record (EHR) data on the performance of predictive models targeting risk of admission, intensive care unit (ICU) use, or prolonged length of stay (PLOS) among presenting febrile pediatric emergency department (ED) patients.

Materials and methods: Historical ED EHR data was used to train a series of XGBoost (XGB) and logistic regression (LR) classifiers. Data handling strategies included imputation methods (multiple imputation (MI), median imputation, complete case (CC) analysis), and imbalanced data corrections (minority oversampling, stratified sub-group analysis). Model performance was evaluated using discriminative (AUC, AUPRC) and calibration metrics (Brier score, Z-scores, p-values).

Results: Among the study population, 34 % were admitted, 2 % utilized the ICU, and 7 % had a PLOS. Significant data missingness was observed and determined to be not at random (MNAR). In predicting admissions using data recorded within the first two hours of presentation, LR trained using full cohort with median imputation was comparable to MI yielding well-calibrated admissions models with an AUC/AUPRC of 0.82/0.73 while CC analysis yielded an AUC/AUPRC of 0.76/0.78. XGB, trained with unimputed data, produced a well-calibrated admissions classifier with an AUC/AUPRC of 0.85/0.78. In contrast, imbalanced data correction techniques, including synthetic minority oversampling (SMOTE), risk stratification, or the use of XGB did not significantly improve the poor AUPRC and calibration performance of LR models predicting ICU and PLOS.

Conclusion: Both XGB and LR with median imputation demonstrated robust performance in predicting admissions in the presence of missing data. However, deriving clinically useful models for rare outcomes, such as ICU use or PLOS, remains a challenge due to poor precision/recall and calibration performance. Further research is needed to improve the prediction of rare outcomes in this population.

Keywords: Febrile; Imbalanced data; Imputation; Machine learning; Pediatric emergency medicine.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R41 AI167224/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting hospital admissions, ICU utilization, and prolonged length of stay among febrile pediatric emergency department patients using incomplete and imbalanced electronic health record (EHR) data strategies

Affiliations

Predicting hospital admissions, ICU utilization, and prolonged length of stay among febrile pediatric emergency department patients using incomplete and imbalanced electronic health record (EHR) data strategies

Authors

Affiliations

Abstract

Conflict of interest statement

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous