Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 31:2024:145-154.
eCollection 2024.

Aiming for Relevance

Affiliations

Aiming for Relevance

Bar Eini-Porat et al. AMIA Jt Summits Transl Sci Proc. .

Abstract

Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Normal range utility: heart rate example. The values between the red lines are the clinically normal range. The blue curve is the calculated normal range utility curve.
Figure 2.
Figure 2.
Illustration of trend terms. Blue points represent the true vital sign measurements, and pink are predictions. The pink arrow represents the predicted trend Y^t, the light blue arrow represents the actual trend Ytn:t+m, and blue arrow is the expected trend Ytn:t'.
Figure 3.
Figure 3.
Simulation data: The top panel shows true values yt (black) against XGBoost regressor step predictions (blue) for the simulated signal s3. The middle and bottom panels display RMSE and Utd cost, respectively. The Utd cost is only sensitive to the sharp mismatch at the “drop” event.
Figure 4.
Figure 4.
Simuation data: The top row presents RMSE over different types of events per model. On the bottom is the corresponding utility cost tested for the same models.
Figure 5.
Figure 5.
LSTM model trained with RMSE loss for 6000 epochs (blue) vs. LSTM model trained with Utd component (orange) for 5000 epochs. With less training, the utility oriented model proves to be more accurate on sudden events.
Figure 6.
Figure 6.
Simulation data: Performance of LSTMs trained with utility loss, in terms of RMSE over events of different types, as well as the corresponding utility metrics. Baseline LSTM is marked as ‘LSTM’, the Clinical range trained model is marked as ‘Range_LSTM’, the model trained for overall trend as ‘LSTM_Trend’ and the model trained for trend deviations is ‘LSTM_Dev’.
Figure 7.
Figure 7.
eICU and MIMIC subsets blood pressure prediction task. The performance according to RMSE and each of the three utility costs is shown. The top row depicts these measures per model for the eICU dataset, and the bottom row depicts the same measures for the MIMIC dataset.
Figure 8.
Figure 8.
Heartrate prediction over the eICU subset. Baseline model trained with RMSE loss (blue) vs. LSTM model trained additionally with Utd component (orange). The utility-optimized model captures more events. In the bottom row, the blue and orange bars overlap.
Figure 9.
Figure 9.
Difference in RMSE over annotated events between vanilla LSTM model and Mixed model trained with utility costs. The task is heartrate prediction in the eICU dataset.

References

    1. Colopy G. W., Roberts S. J., Clifton D. A. Bayesian optimization of personalized models for patient vital-sign monitoring. IEEE journal of biomedical and health informatics. 2017;22(2):301–310. - PubMed
    1. Mincholé A., Camps J., Lyon A., Rodríguez B. Machine learning in the electrocardiogram. Journal of electrocardiology. 2019;57:S61–S64. - PubMed
    1. Colopy G. W., Roberts S. J., Clifton D. A. Gaussian processes for personalized interpretable volatility metrics in the step-down ward.? IEEE Journal of Biomedical and Health Informatics. 2019;23(3):949–959. - PubMed
    1. Eini-Porat B., Amir O., Eytan D., Shalit U. Tell me something interesting: Clinical utility of machine learning prediction models in the ICU. Journal of Biomedical Informatics. 2022;132:104107. - PubMed
    1. Rim B., Sung N. J., Min S., Hong M. Deep learning in physiological signal data: A survey.? Sensors. 2020;20(4):969. - PMC - PubMed

LinkOut - more resources