Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 6;23(1):59.
doi: 10.1186/s12911-023-02159-7.

Explainable prediction of daily hospitalizations for cerebrovascular disease using stacked ensemble learning

Affiliations

Explainable prediction of daily hospitalizations for cerebrovascular disease using stacked ensemble learning

Xiaoya Lu et al. BMC Med Inform Decis Mak. .

Abstract

Background: With the prevalence of cerebrovascular disease (CD) and the increasing strain on healthcare resources, forecasting the healthcare demands of cerebrovascular patients has significant implications for optimizing medical resources.

Methods: In this study, a stacking ensemble model comprised of four base learners (ridge regression, random forest, gradient boosting decision tree, and artificial neural network) and a meta learner (elastic net) was proposed for predicting the daily number of hospital admissions (HAs) for CD using the historical HAs data, air quality data, and meteorological data in Chengdu, China from 2015 to 2018. To solve the label imbalance problem, a re-weighting method based on label distribution smoothing was integrated into the meta learner. We trained the model using the data from 2015 to 2017 and evaluated its predictive ability using the data in 2018 based on four metrics, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2). In addition, the SHapley Additive exPlanations (SHAP) framework was applied to provide explanation for the prediction of our stacking model.

Results: Our proposed model outperformed all the base learners and long short-term memory (LSTM) on two datasets. Particularly, compared with the optimal results obtained by individual models, the MAE, RMSE, and MAPE of the stacking model decreased by 13.9%, 12.7%, and 5.8%, respectively, and the R2 improved by 6.8% on CD dataset. The model explanation demonstrated that environmental features played a role in further improving the model performance and identified that high temperature and high concentrations of gaseous air pollutants might strongly associate with an increased risk of CD.

Conclusions: Our stacking model considering environmental exposure is efficient in predicting daily HAs for CD and has practical value in early warning and healthcare resource allocation.

Keywords: Cerebrovascular disease; Environmental exposure; Hospital admissions; SHAP value; Stacking ensemble model.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Schematic diagram of stacking model development
Fig. 2
Fig. 2
Schematic diagram of label distribution smoothing
Fig. 3
Fig. 3
The comparison and residual between the observed HAs and the predictions of the stacking model with LDS on CD dataset and stroke dataset
Fig. 4
Fig. 4
Heatmap plot of SHAP values of all features across all samples in the CD training set. The width of the black bar on the right-hand side shows the global importance of each feature. a Calendar features and HAs features b Environmental features
Fig. 5
Fig. 5
Waterfall plot of SHAP values to four selected samples, i.e., samples on August 7, 14, 21 and 28, 2018. The new baselines and the final predictions are marked at the bottom and top of the image, respectively. The SHAP values of each feature are listed on the bar
Fig. 6
Fig. 6
The left side shows empirical label distribution plots, and the right side shows comparison plots of error before and after using LDS on two testing datasets: a CD and b stroke
Fig. 7
Fig. 7
SHAP dependence plots that show the effect of TEM lag5 and RH lag1 on the predictions of HAs

Similar articles

Cited by

References

    1. WHO mortality database: the number of deaths caused by cerebrovascular disease. https://platform.who.int/mortality/themes/theme-details/topics/indicator.... Accessed 3 Sep 2022.
    1. China TWC of the R on CH and D in. Report on Cardiovascular Health and Diseases in China An Updated Summary. Biomed Environ Sci. 2021;2022(35):573–603. - PubMed
    1. Vered S, Paz S, Negev M, Tanne D, Zucker I, Weinstein G. High ambient temperature in summer and risk of stroke or transient ischemic attack: a national study in Israel. Environ Res. 2020;187:109678. doi: 10.1016/j.envres.2020.109678. - DOI - PubMed
    1. Liu H, Tian Y, Xu Y, Huang Z, Huang C, Hu Y, et al. Association between ambient air pollution and hospitalization for ischemic and hemorrhagic stroke in China: a multicity case-crossover study. Environ Pollut. 2017;230:234–241. doi: 10.1016/j.envpol.2017.06.057. - DOI - PubMed
    1. Abedi A, Baygi MM, Poursafa P, Mehrara M, Amin MM, Hemami F, et al. Air pollution and hospitalization: an autoregressive distributed lag (ARDL) approach. Environ Sci Pollut Res. 2020;27:30673–30680. doi: 10.1007/s11356-020-09152-x. - DOI - PubMed

Publication types

LinkOut - more resources