. 2025 Mar 24;14(7):2213.

doi: 10.3390/jcm14072213.

Preserving Informative Presence: How Missing Data and Imputation Strategies Affect the Performance of an AI-Based Early Warning Score

Taeyong Sim¹, Sangchul Hahn¹, Kwang-Joon Kim^{1

2}, Eun-Young Cho¹, Yeeun Jeong¹, Ji-Hyun Kim¹, Eun-Yeong Ha³, In-Cheol Kim⁴, Sun-Hyo Park⁵, Chi-Heum Cho⁶, Gyeong-Im Yu³, Hochan Cho³, Ki-Byung Lee^{1

7}

Affiliations

¹ AITRICS Corp., Seoul 06221, Republic of Korea.
² Division of Geriatrics, Department of Internal Medicine, Yonsei University College of Medicine, Seoul 03722, Republic of Korea.
³ Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu 42601, Republic of Korea.
⁴ Division of Cardiology, Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu 42601, Republic of Korea.
⁵ Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu 42601, Republic of Korea.
⁶ Department of Obstetrics and Gynecology, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu 42601, Republic of Korea.
⁷ Division of Pulmonary, Allergy and Critical Care Medicine, Department of Internal Medicine, Chuncheon Sacred Heart Hospital, Hallym University Medical Center, Chuncheon 24253, Republic of Korea.

PMID: 40217663
PMCID: PMC11989256
DOI: 10.3390/jcm14072213

Preserving Informative Presence: How Missing Data and Imputation Strategies Affect the Performance of an AI-Based Early Warning Score

Taeyong Sim et al. J Clin Med. 2025.

. 2025 Mar 24;14(7):2213.

doi: 10.3390/jcm14072213.

Authors

Affiliations

¹ AITRICS Corp., Seoul 06221, Republic of Korea.
² Division of Geriatrics, Department of Internal Medicine, Yonsei University College of Medicine, Seoul 03722, Republic of Korea.
³ Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu 42601, Republic of Korea.
⁴ Division of Cardiology, Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu 42601, Republic of Korea.
⁵ Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu 42601, Republic of Korea.
⁶ Department of Obstetrics and Gynecology, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu 42601, Republic of Korea.
⁷ Division of Pulmonary, Allergy and Critical Care Medicine, Department of Internal Medicine, Chuncheon Sacred Heart Hospital, Hallym University Medical Center, Chuncheon 24253, Republic of Korea.

PMID: 40217663
PMCID: PMC11989256
DOI: 10.3390/jcm14072213

Abstract

Background/Objectives: Data availability can affect the performance of AI-based early warning scores (EWSs). This study evaluated how the extent of missing data and imputation strategies influence the predictive performance of the VitalCare-Major Adverse Event Score (VC-MAES), an AI-based EWS that uses last observation carried forward and normal-value imputation for missing values, to forecast clinical deterioration events, including unplanned ICU transfers, cardiac arrests, or death, up to 6 h in advance. Methods: We analyzed real-world data from 6039 patient encounters at Keimyung University Dongsan Hospital, Republic of Korea. Performance was evaluated under three scenarios: (1) using only vital signs and age, treating all other variables as missing; (2) reintroducing a full set of real-world clinical variables; and (3) imputing missing values drawn from a distribution within one standard deviation of the observed mean or using Multiple Imputation by Chained Equations (MICE). Results: VC-MAES achieved the area under the receiver operating characteristic curve (AUROC) of 0.896 using only vital signs and age, outperforming traditional EWSs, including the National Early Warning Score (0.797) and the Modified Early Warning Score (0.722). Reintroducing full clinical variables improved the AUROC to 0.918, whereas mean-based imputation or MICE decreased the performance to 0.885 and 0.827, respectively. Conclusions: VC-MAES demonstrates robust predictive performance with limited inputs, outperforming traditional EWSs. Incorporating actual clinical data significantly improved accuracy. In contrast, mean-based or MICE imputation yielded poorer results than the default normal-value imputation, potentially due to disregarding the "informative presence" embedded in missing data patterns. These findings underscore the importance of understanding missingness patterns and employing imputation strategies that consider the decision-making context behind data availability to enhance model reliability.

Keywords: artificial intelligence; early warning score; imputation; modified early warning score; national early warning score.

PubMed Disclaimer

Conflict of interest statement

Authors Taeyong Sim, Sangchul Hahn, Kwang-Joon Kim, Eun-Young Cho, Yeeun Jeong, Ji-hyun Kim and Ki-Byung Lee were employed by the company AITRICS. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Receiver Operating Characteristic Curves (a) and Precision–Recall Curves (b) comparing predictive performance among models using vital signs only, vital signs + laboratory data, Modified Early Warning Score (MEWS), and National Early Warning Score (NEWS).

**Figure 2**
Receiver Operating Characteristic Curves (a) and Precision–Recall Curves (b) comparing predictive performance among models using vital signs + laboratory data and forced lab imputation.

See this image and copyright information in PMC

Cited by

Clinical Context Is More Important than Data Quantity to the Performance of an Artificial Intelligence-Based Early Warning System.
Sim T, Cho E, Kim J, Kim HG, Kim SJ. Sim T, et al. J Clin Med. 2025 Jun 23;14(13):4444. doi: 10.3390/jcm14134444. J Clin Med. 2025. PMID: 40648818 Free PMC article.
Deep Learning-Based Early Warning Systems in Hospitalized Patients at Risk of Code Blue Events and Length of Stay: Retrospective Real-World Implementation Study.
Kim JH, Cho EY, Choi Y, Won JY, Cheon SH, Kim YA, Lee KB, Kim KJ, Kim HG, Sim T. Kim JH, et al. JMIR Med Inform. 2025 Aug 22;13:e72232. doi: 10.2196/72232. JMIR Med Inform. 2025. PMID: 40845828 Free PMC article.

References

1. Liu F., Panagiotakos D. Real-world data: A brief review of the methods, applications, challenges and opportunities. BMC Med. Res. Methodol. 2022;22:287. doi: 10.1186/s12874-022-01768-6. - DOI - PMC - PubMed
1. Mendelsohn A.B., Dreyer N.A., Mattox P.W., Su Z., Swenson A., Li R., Turner J.R., Velentgas P. Characterization of Missing Data in Clinical Registry Studies. Ther. Innov. Regul. Sci. 2015;49:146–154. doi: 10.1177/2168479014532259. - DOI - PubMed
1. Goldstein B.A., Navar A.M., Pencina M.J., Ioannidis J.P.A. Opportunities and challenges in developing risk prediction models with electronic health records data: A systematic review. J. Am. Med. Inform. Assoc. 2017;24:198–208. doi: 10.1093/jamia/ocw042. - DOI - PMC - PubMed
1. Getzen E., Ungar L., Mowery D., Jiang X., Long Q. Mining for equitable health: Assessing the impact of missing data in electronic health records. J. Biomed. Inform. 2023;139:104269. doi: 10.1016/j.jbi.2022.104269. - DOI - PMC - PubMed
1. Ayilara O.F., Zhang L., Sajobi T.T., Sawatzky R., Bohm E., Lix L.M. Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual. Life Outcomes. 2019;17:106. doi: 10.1186/s12955-019-1181-2. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Preserving Informative Presence: How Missing Data and Imputation Strategies Affect the Performance of an AI-Based Early Warning Score

Affiliations

Preserving Informative Presence: How Missing Data and Imputation Strategies Affect the Performance of an AI-Based Early Warning Score

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources