Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 29;15(1):27552.
doi: 10.1038/s41598-025-12264-4.

Developing and validating machine learning models to predict next-day extubation

Affiliations

Developing and validating machine learning models to predict next-day extubation

Samuel W Fenske et al. Sci Rep. .

Abstract

Criteria to identify patients who are ready to be liberated from mechanical ventilation (MV) are imprecise, often resulting in prolonged MV or reintubation, both of which are associated with adverse outcomes. Daily protocol-driven assessment of the need for MV leads to earlier extubation but requires dedicated personnel. We sought to determine whether machine learning (ML) applied to the electronic health record could predict next-day extubation. We examined 37 clinical features aggregated from 12AM-8AM on each patient-ICU-day from a single-center prospective cohort study of patients in our quaternary care medical ICU who received MV. We also tested our models on an external test set from a community hospital ICU in our health care system. We used three data encoding/imputation strategies and built XGBoost, LightGBM, logistic regression, LSTM, and RNN models to predict next-day extubation. We compared model predictions and actual events to examine how model-driven care might have differed from actual care. Our internal cohort included 448 patients and 3,095 ICU days, and our external test cohort had 333 patients and 2,835 ICU days. The best model (LSTM) predicted next-day extubation with an AUROC of 0.870 (95% CI 0.834-0.902) on the internal test cohort and 0.870 (95% CI 0.848-0.885) on the external test cohort. Across multiple model types, measures previously demonstrated to be important in determining readiness for extubation were found to be most informative, including plateau pressure and Richmond Agitation Sedation Scale (RASS) score. Our model often predicted patients to be stable for extubation in the days preceding their actual extubation, with 63.8% of predicted extubations occurring within three days of true extubation. Our findings suggest that an ML model may serve as a useful clinical decision support tool rather than complete replacement of clinical judgement. However, any ML-based model should be compared with protocol-based practice in a prospective, randomized controlled trial to determine improvement in outcomes while maintaining safety as well as cost effectiveness.

Keywords: Critical care; Deep learning; Machine learning; Mechanical ventilation; Respiratory failure.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: BDS holds US patent 10,905,706, “Compositions and methods to accelerate resolution of acute lung inflammation,” and serves on the scientific advisory board of Zoe Biosciences, in which he holds stock options. Other authors have no conflicts within the area of this work.

Figures

Fig. 1
Fig. 1
(A) Data split for model development: schematic describing the data split used in our clinical ML study. The dataset was divided into three distinct subsets: the training set, validation set, and test set. The training set was used to train the MLmodels, the validation set for tuning and model selection, and the test set to assess the final model’s performance and generalization to unseen data. (B) Data processing pipeline; full details are available in Supplemental Methods. Individual days are labeled as intubated or extubated, with 12am–8am aggregated features used to predict next-day intubation/extubation status. Data are cleaned, detailed in Methods and Supplemental Methods, before ML models are trained and evaluated.
Fig. 2
Fig. 2
(A) Performance metrics of different ML models. The receiver operating characteristic curve (ROC), precision-recall curve (PRC) plots of different ML models on same test set along with values of respective area under the curves, using each model’s best-performing imputation method, including extreme gradient boosting (XGBoost), Recurrent Neural Network (RNN), and long short-term memory (LSTM), on the test set, using different imputation strategies (raw and binning, detailed in Methods). Curves displayed are for a single pass through the test set. Full metrics and confidence intervals for the top optimized models shown are in Supplemental Table 4. (B) Model performance on external test cohort. We applied our top-performing binned LSTM model (based on AUROC) to a patient cohort from a different hospital system as an external test. ROC and curves show similar performance to the SCRIPT test set in Fig. 2. Curves displayed are for a single pass through the test set. Full metrics and confidence intervals for the top optimized models shown are in Supplemental Table 4.
Fig. 3
Fig. 3
Feature importance plot for LSTM model. Feature importance ablation plots for an LSTM model predicting next-day extubation provide insights in the significance of each feature. In these plots, each feature is a row, and the x axis represents how important that feature is. This feature importance is calculated by doing 37 iterations (number of features) while masking an individual feature each time and seeing the decline in AUC on the test set without that feature available.
Fig. 4
Fig. 4
Examining discrepancies between model prediction and time of extubation. (A) For each intubation sequence preceding extubation in the SCRIPT test cohort, we examined the first instance predicting next-day extubation (Supplemental Methods). Many model predictions, if not exactly one day before extubation as intended (29.3%), were within two days (50%) or three days (63.8%). (B) The same analysis as (A) performed on the external test cohort, where we report 37.3% of first next-day extubation predictions occurring within one day of successful extubation, 62.1% within two days, and 74.5% within three days.

References

    1. Melsen, W. G. et al. Attributable mortality of ventilator-associated pneumonia: A meta-analysis of individual patient data from randomised prevention studies. Lancet Infect. Dis.13(8), 665–671 (2013). - PubMed
    1. Epstein, S. K. & Ciubotaru, R. L. Independent effects of etiology of failure and time to reintubation on outcome for patients failing extubation. Am. J. Respir. Crit. Care Med.158(2), 489–493 (1998). - PubMed
    1. Ely, E. W. et al. Effect on the duration of mechanical ventilation of identifying patients capable of breathing spontaneously. N. Engl. J. Med. New Engl. J. Med. (NEJM/MMS)35(25), 1864–1869 (1996). - PubMed
    1. Girard, T. D. et al. Efficacy and safety of a paired sedation and ventilator weaning protocol for mechanically ventilated patients in intensive care (awakening and breathing controlled trial): A randomised controlled trial. Lancet371(9607), 126–134 (2008). - PubMed
    1. Burns, K. E. A., Rizvi, L., Cook, D. J., Lebovic, G., Dodek, P., Villar, J., Slutsky, A. S., Jones, A., Kapadia, F. N., Gattas, D. J., Epstein, S. K., Pelosi, P., Kefala, K., Meade, M. O., Canadian Critical Care Trials Group. Ventilator weaning and discontinuation practices for critically ill patients. JAMA Am. Med. Assoc. (AMA)325(12), 1173–1184 (2021). - PMC - PubMed

LinkOut - more resources