Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 26;24(1):265.
doi: 10.1186/s12879-024-09138-x.

Predicting the incidence of infectious diarrhea with symptom surveillance data using a stacking-based ensembled model

Affiliations

Predicting the incidence of infectious diarrhea with symptom surveillance data using a stacking-based ensembled model

Pengyu Wang et al. BMC Infect Dis. .

Abstract

Background: Infectious diarrhea remains a major public health problem worldwide. This study used stacking ensemble to developed a predictive model for the incidence of infectious diarrhea, aiming to achieve better prediction performance.

Methods: Based on the surveillance data of infectious diarrhea cases, relevant symptoms and meteorological factors of Guangzhou from 2016 to 2021, we developed four base prediction models using artificial neural networks (ANN), Long Short-Term Memory networks (LSTM), support vector regression (SVR) and extreme gradient boosting regression trees (XGBoost), which were then ensembled using stacking to obtain the final prediction model. All the models were evaluated with three metrics: mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE).

Results: Base models that incorporated symptom surveillance data and weekly number of infectious diarrhea cases were able to achieve lower RMSEs, MAEs, and MAPEs than models that added meteorological data and weekly number of infectious diarrhea cases. The LSTM had the best prediction performance among the four base models, and its RMSE, MAE, and MAPE were: 84.85, 57.50 and 15.92%, respectively. The stacking ensembled model outperformed the four base models, whose RMSE, MAE, and MAPE were 75.82, 55.93, and 15.70%, respectively.

Conclusions: The incorporation of symptom surveillance data could improve the predictive accuracy of infectious diarrhea prediction models, and symptom surveillance data was more effective than meteorological data in enhancing model performance. Using stacking to combine multiple prediction models were able to alleviate the difficulty in selecting the optimal model, and could obtain a model with better performance than base models.

Keywords: Ensemble learning; Infectious diarrhea; Prediction model; Stacking; Symptom surveillance.

PubMed Disclaimer

Conflict of interest statement

The authors have no relevant financial or non-financial interests to disclose.

Figures

Fig. 1
Fig. 1
The meta-model training in stacking framework
Fig. 2
Fig. 2
Autocorrelation plot of weekly number of infectious diarrhea cases
Fig. 3
Fig. 3
(A-B): (A) Cross-correlation coefficients between weekly number of infectious diarrhea cases and weekly number of gastroenterology outpatient clinic visits (B) Cross-correlation coefficients between weekly number of infectious diarrhea cases and weekly gastroenterology outpatient clinic visit rate
Fig. 4
Fig. 4
(A-F): Cross-correlation coefficients between weekly number of infectious diarrhea cases and (A) weekly mean air temperature, (B) weekly mean minimum air temperature, (C) weekly mean maximum air temperature, (D) weekly mean atmospheric pressure, (E) weekly mean relative humidity, (F) weekly mean precipitation
Fig. 5
Fig. 5
Time-series plots of predicted values compared to observed values based on testing set

Similar articles

Cited by

References

    1. Abbafati C, Abbas KM, Abbasi M, Abbasifard M, Abbasi-Kangevari M, et al. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of Disease Study 2019. Lancet. 2020;396(10258):1204–22. doi: 10.1016/S0140-6736(20)30925-9. - DOI - PMC - PubMed
    1. Chen C, Guan Z, Huang CY, Jiang DX, Liu XX, et al. Epidemiological trends and hotspots of other infectious diarrhea (OID) in Mainland China: a Population-based Surveillance Study from 2004 to 2017. Front Public Health. 2021;9. 10.3389/fpubh.2021.679853. - PMC - PubMed
    1. Wang Yongming J, Li J, Gu Z, Zhou, Wang Z. Artificial neural networks for infectious diarrhea prediction using meteorological factors in Shanghai (China) Appl Soft Comput. 2015;35:280–90. doi: 10.1016/j.asoc.2015.05.047. - DOI
    1. Fang XY, Liu WD, Ai J, He MK, Wu Y, et al. Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China. BMC Infect Dis. 2020;20(1):8. doi: 10.1186/s12879-020-4930-2. - DOI - PMC - PubMed
    1. Berry AC. Syndromic surveillance and its utilisation for mass gatherings. Epidemiol Infect. 2019;147. 10.1017/s0950268818001735. - PMC - PubMed

LinkOut - more resources