Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 29:1-29.
doi: 10.1007/s10479-022-04984-x. Online ahead of print.

Using machine learning in prediction of ICU admission, mortality, and length of stay in the early stage of admission of COVID-19 patients

Affiliations

Using machine learning in prediction of ICU admission, mortality, and length of stay in the early stage of admission of COVID-19 patients

Sara Saadatmand et al. Ann Oper Res. .

Abstract

The recent COVID-19 pandemic has affected health systems across the world. Especially, Intensive Care Units (ICUs) have played a pivotal role in the treatment of critically-ill patients. At the same time however, the increasing number of admissions due to the vast prevalence of the virus have caused several problems for ICU wards such as overburdening of staff and shortages of medical resources. These issues might have affected the quality of healthcare services provided directly impacting a patient's survival. The objective of this research is to leverage Machine Learning (ML) on hospital data in order to support hospital managers and practitioners with the treatment of COVID-19 patients. This is accomplished by providing more detailed inference about a patient's likelihood of ICU admission, mortality and in case of hospitalization the length of stay (LOS). In this pursuit, the outcome variables are in three separate models predicted by five different ML algorithms: eXtreme Gradient Boosting (XGB), K-Nearest Neighbor (KNN), Random Forest (RF), bagged-CART (b-CART), and LogitBoost (LB). With the exception of KNN, the studied models show good predictive capabilities when evaluating relevant accuracy scores, such as area under the curve. By implementing an ensemble stacking approach (either a Neural Net or a General Linear Model) on top of the aforementioned ML algorithms the performance is further boosted. Ultimately, for the prediction of admission to the ICU, the ensemble stacking via a Neural Net achieved the best result with an accuracy of over 95%. For mortality at the ICU, the vanilla XGB performed slightly better (1% difference with the meta-model). To predict large length of stays both ensemble stacking approaches yield comparable results. Besides it direct implications for managing COVID-19 patients, the approach presented serves as an example how data can be employed in future pandemics or crises.

Keywords: COVID-19 pandemic; Ensemble modeling; ML in health systems; Supervised learning.

PubMed Disclaimer

Conflict of interest statement

Conflict of interestsThe authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
Architecture of the framework of this study
Fig. 2
Fig. 2
The selection of cases of ICU admitted COVID-19 patients
Fig. 3
Fig. 3
Percentage of missing values of the dataset. Abbreviations: LDH (lactate dehydrogenase), T.B. (total bilirubin), ESR (erythrocyte sedimentation rate), AST (aspartate aminotransferase), ALT (alanine transaminase), L.disease (chronic lung disease), Nd.disease (chronic neurological disorder), K.disease (chronic kidney disease), S.cough (sputum cough), A.pain (abdominal pain), H.disease (heart disease), INR ((international normalized ratio), High.bp (high blood pressure), PT (prothrombin time), O2.s (O2 saturation), WBC (white blood cells) count, R.rate (respiratory rate), Diastolic (diastolic pressure), Temp (temperature), H.rate (heart rate), Systolic (systolic pressure), ARI (acute respiratory infection), NCD (non communicable diseases)
Fig. 4
Fig. 4
Kernel Density Estimation of initial and imputed data for some of the variables. The red curves denote the imputed data distribution and the blue curves demonstrate the distribution of initial data. Abbreviations: Temp (temperature), H.rate (heart rate), R.rate (respiratory rate), Systolic (systolic pressure), Diastolic (diastolic pressure), O2.s (O2 saturation), Fever.H (history of fever), PT (prothrombin time), INR (international normalized ratio), ALT (alanine transaminase), LDH (lactate dehydrogenase), ESR (erythrocyte sedimentation rate)
Fig. 5
Fig. 5
The Boruta algorithm feature selection for ICU admission. Green boxes denote the confirmed features, yellow boxes represent the tentative variables, blue boxes illustrate the minimum, average, and maximum of shadow variables, and red boxes show the irrelevant features
Fig. 6
Fig. 6
The Boruta algorithm feature selection for ICU mortality. Green boxes denote the confirmed features, yellow boxes represent the tentative variables, blue boxes illustrate the minimum, average, and maximum of shadow variables, and red boxes show the irrelevant features
Fig. 7
Fig. 7
The Boruta algorithm feature selection for ICU LOS. Green boxes denote the confirmed features, yellow boxes represent the tentative variables, blue boxes illustrate the minimum, average, and maximum of shadow variables, and red boxes show the irrelevant features
Fig. 8
Fig. 8
The ROC curves of five ML algorithms for ICU admission. Abbreviations: XGB (extreme gradient boosting), KNN (k-nearest neighbor), RF (random forest), b-CART (bagged CART), LB (LogitBoost)
Fig. 9
Fig. 9
The ROC curves of five ML algorithms for ICU mortality. Abbreviations: XGB (extreme gradient boosting), KNN (k-nearest neighbor), RF (random forest), b-CART (bagged CART), LB (LogitBoost)
Fig. 10
Fig. 10
The ROC curves of five ML algorithms for ICU LOS. Abbreviations: XGB (extreme gradient boosting), KNN (k-nearest neighbor), RF (random forest), b-CART (bagged CART), LB (LogitBoost)
Fig. 11
Fig. 11
the overall concept of the ensemble method

Similar articles

Cited by

References

    1. Abu Alfeilat HA, et al. Effects of distance measure choice on K-nearest neighbor classifier performance: A review. Big Data. 2019;7(4):221–248. doi: 10.1089/big.2018.0175. - DOI - PubMed
    1. Alazzam I, Alsmadi I, Akour M. Software fault proneness prediction: A comparative study between bagging, boosting, and stacking ensemble and base learner methods. International Journal of Data Analysis Techniques and Strategies. 2017;9(1):1. doi: 10.1504/IJDATS.2017.10003991. - DOI
    1. Alinaghian M, Goli A. Location, allocation and routing of temporary health centers in rural areas in crisis, solved by improved harmony search algorithm. International Journal of Computational Intelligence Systems. 2017;10(1):894. doi: 10.2991/ijcis.2017.10.1.60. - DOI
    1. Altini N, et al. Predictive machine learning models and survival analysis for COVID-19 prognosis based on hematochemical parameters. Sensors. 2021;21(24):8503. doi: 10.3390/s21248503. - DOI - PMC - PubMed
    1. Araç S, Özel M. A new parameter for predict the clinical outcome of patients with COVID-19 pneumonia: The direct/total bilirubin ratio. International Journal of Clinical Practice. 2021 doi: 10.1111/ijcp.14557. - DOI - PMC - PubMed

LinkOut - more resources