Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 23;16(1):6787.
doi: 10.1038/s41467-025-61418-5.

Open-source computational pipeline flags instances of acute respiratory distress syndrome in mechanically ventilated adult patients

Affiliations

Open-source computational pipeline flags instances of acute respiratory distress syndrome in mechanically ventilated adult patients

Félix L Morales et al. Nat Commun. .

Abstract

Physicians in critical care settings face information overload and decision fatigue, contributing to under-recognition of acute respiratory distress syndrome, which affects over 10% of intensive care patients and carries over 40% mortality rate. We present a reproducible computational pipeline to automatically identify this condition retrospectively in mechanically ventilated adults. This computational pipeline operationalizes the Berlin Definition by detecting bilateral infiltrates from radiology reports and a pneumonia diagnosis from attending physician notes, using interpretable classifiers trained on labeled data. Here we show that our integrated pipeline achieves high performance-93.5% sensitivity and 17.4% false positive rate-when applied to a held-out and publicly-available dataset from an external hospital. This substantially exceeds the 22.6% documentation rate observed in the same cohort. These results demonstrate that our automated adjudication pipeline can accurately identify an under-diagnosed condition in critical care and may support timely recognition and intervention through integration with electronic health records.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Machine learning (ML) models achieve high-performance in adjudicating the presence of bilateral infiltrates from chest imaging reports.
Error bars and bands show 95% confidence intervals for estimates of the mean obtained using bootstrapping (n = 10). a Receiver operating characteristic (ROC) curve for the eXtreme Gradient Boosting (XGBoost) model trained on chest imaging reports from the development set. b Bootstrapped mean area under the ROC (AUROC) shows that all four ML approaches yield accuracies greater or equal to 0.85. c Feature importances for the four different ML approaches considered. Features in bold are highly ranked in importance in all 4 approaches.
Fig. 2
Fig. 2. Calibration of output probabilities in implemented ML models.
Error bands show 95% confidence intervals for estimates of the mean obtained using bootstrapping (n = 10). We show calibration curves for models trained on the development set: a decision tree, b logistic regression, c random forest, and d eXtreme Gradient Boosting (XGBoost). A perfectly calibrated model would have a 1:1 relationship between the fraction of positive labels and mean probabilities (i.e., it would overlay the diagonal line). The Durbin-Watson statistic, DW, probes for correlations in the residuals; if DW is close to 2, then one can rule out correlations in the residuals, implying good linear behavior.
Fig. 3
Fig. 3. Evaluation of the Bilateral Infiltrates (BI) model on chest imaging reports from MIMIC (2001-12).
Error bars and bands in a, b, and d, show 95% confidence intervals for estimates of the mean obtained using bootstrapping (n = 100). a Receiver operating characteristic (ROC) curve for the BI model tested on 975 bootstrapped chest imaging reports from MIMIC (2001-12). b Calibration of probabilities by the BI model when applied on 975 bootstrapped chest imaging reports from MIMIC (2001-12). c Confusion matrix comparing MIMIC (2001-12) chest imaging report adjudications by the critical care physician (ground truth) against BI model adjudications done at a 50% probability threshold. Notice that the numbers add up to 975. d Comparing the output probabilities of the BI model (a measure of its confidence that a report is consistent with bilateral infiltrates) in three agreement scenarios between the critical care physician and the internal medicine physician when adjudicating MIMIC (2001-12)’s chest imaging reports.
Fig. 4
Fig. 4. eXtreme Gradient Boosting (XGBoost) model performance in adjudicating the presence of some risk factors in attending physician notes amenable to ML techniques.
Error bars and bands show 95% confidence intervals for estimates of the mean obtained using bootstrapping (n = 100). a Receiver operating characteristic (ROC) curve showing the cross-validated performance of XGBoost model trained to adjudicate pneumonia on Hospital A (2013) attending physician notes. b Swarm plot showing cross-validated areas under the ROC curve (AUROCs) of XGBoost models trained to adjudicate all the attempted risk factors, plus congestive heart failure (CHF), on Hospital A (2013) attending notes. c Training set feature importances by the XGBoost models trained to adjudicate pneumonia and sepsis. d Training set calibration curves for the XGBoost models trained to adjudicate pneumonia and sepsis.
Fig. 5
Fig. 5. Evaluation of the Pneumonia Model on attending physician notes from MIMIC (2001-12).
Error bands show 95% confidence intervals for estimates of the mean obtained using bootstrapping (n = 100). a Receiver operating characteristic (ROC) curve for the Pneumonia Model tested on 790 bootstrapped attending physician notes from MIMIC (2001-12) that were regex-captured for pneumonia. b Shapley-additive explanations (SHAP) values for the top 15 words in terms of their impact on Pneumonia Model’s output probabilities. c Calibration of probabilities by the Pneumonia Model when applied on 790 bootstrapped attending physician notes from MIMIC (2001-12). d Confusion matrix comparing MIMIC (2001-12) attending physician notes, pneumonia labels by the critical care physician (ground truth) against Pneumonia Model adjudications done at a 50% probability threshold. Notice that the numbers add up to 790.
Fig. 6
Fig. 6. Risk factor adjudication performance of regular expressions on MIMIC (2001-12).
All regular expressions developed adjudicate 100% of the attending physician notes from MIMIC (2001-12) labeled as yes for each risk factor. In terms of precision, burns, pancreatitis, sepsis, and aspiration exceeded 80%.
Fig. 7
Fig. 7. Computational pipeline for adjudication of Hospital A (2013) cohort yields a small fraction of false negatives and an acceptable fraction of false positives.
a Flowchart of ARDS adjudication by computational pipeline (blue) vs. physician (black). b Confusion matrix comparing physician adjudication from previous publication against computational pipeline.
Fig. 8
Fig. 8. Computational pipeline for adjudication of MIMIC (2001-12) cohort yields a small fraction of false negatives and a manageable fraction of false positives.
a Flowchart of ARDS adjudication by computational pipeline (blue) vs. physician (black). b Confusion matrix comparing physician adjudication (ground truth) against computational adjudication pipeline (top panel), and physician adjudication (ground truth) against a less experienced physician adjudication.

Update of

References

    1. Park, J., Zhong, X., Dong, Y., Barwise, A. & Pickering, B. W. Investigating the cognitive capacity constraints of an ICU care team using a systems engineering approach. BMC Anesthesiol.22, 1–13 (2022). - PMC - PubMed
    1. James, J. T. A new, evidence-based estimate of patient harms associated with hospital care. J. Patient Saf.9, 122–128 (2013). - PubMed
    1. Makary, M. A. & Daniel, M. Medical error—the third leading cause of death in the US. BMJ353, 10.1136/bmj.i2139 (2016). - PubMed
    1. Landrigan, C. P. et al. Temporal trends in rates of patient harm resulting from medical care. N. Engl. J. Med.363, 2124–2134 (2010). - PubMed
    1. DeGrave, A. J., Janizek, J. D. & Lee, S. I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell.3, 610–619 (2021).

LinkOut - more resources