Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 14;20(1):222.
doi: 10.1186/s12879-020-4930-2.

Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China

Affiliations

Forecasting incidence of infectious diarrhea using random forest in Jiangsu Province, China

Xinyu Fang et al. BMC Infect Dis. .

Abstract

Background: Infectious diarrhea can lead to a considerable global disease burden. Thus, the accurate prediction of an infectious diarrhea epidemic is crucial for public health authorities. This study was aimed at developing an optimal random forest (RF) model, considering meteorological factors used to predict an incidence of infectious diarrhea in Jiangsu Province, China.

Methods: An RF model was developed and compared with classical autoregressive integrated moving average (ARIMA)/X models. Morbidity and meteorological data from 2012 to 2016 were used to construct the models and the data from 2017 were used for testing.

Results: The RF model considered atmospheric pressure, precipitation, relative humidity, and their lagged terms, as well as 1-4 week lag morbidity and time variable as the predictors. Meanwhile, a univariate model ARIMA (1,0,1)(1,0,0)52 (AIC = - 575.92, BIC = - 558.14) and a multivariable model ARIMAX (1,0,1)(1,0,0)52 with 0-1 week lag precipitation (AIC = - 578.58, BIC = - 578.13) were developed as benchmarks. The RF model outperformed the ARIMA/X models with a mean absolute percentage error (MAPE) of approximately 20%. The performance of the ARIMAX model was comparable to that of the ARIMA model with a MAPE reaching approximately 30%.

Conclusions: The RF model fitted the dynamic nature of an infectious diarrhea epidemic well and delivered an ideal prediction accuracy. It comprehensively combined the synchronous and lagged effects of meteorological factors; it also integrated the autocorrelation and seasonality of the morbidity. The RF model can be used to predict the epidemic level and has a high potential for practical implementation.

Keywords: Forecasting; Infectious diarrhea; Random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Weekly observed cases of infectious diarrhea in Jiangsu Province, 2012–2017. Note: From top to bottom, the lines represent actual observations, the trend, seasonal, and random components
Fig. 2
Fig. 2
Variable importance in random forest regression model for infectious diarrhea
Fig. 3
Fig. 3
Observed infectious diarrhea incidences and values predicted by different models. Note: The left side of the vertical line indicates the model fitting stage, and the right side indicates the prospective stage

References

    1. GBD 2015 Disease and Injury Incidence and Prevalence Collaborators Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet. 2016;388(10053):1545–1602. - PMC - PubMed
    1. GBD 2015 Mortality and Causes of Death Collaborators Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet. 2016;388(10053):1459–1544. - PMC - PubMed
    1. Zhang P, Zhang J. Surveillance on other infectious diarrheal diseases in China from 2014 to 2015. Chin J Epidemiol 2017;38(4):424–430.(in Chinese). - PubMed
    1. Yang E, Park HW, Choi YH, Kim J, Munkhdalai L, Musa I, et al. A simulation-based study on the comparison of statistical and time series forecasting methods for early detection of infectious disease outbreaks. Int J Environ Res Public Health. 2018;15(5):966. - PMC - PubMed
    1. Zhang Y, Bi P, Hiller JE, Sun Y, Ryan P. Climate variations and bacillary dysentery in northern and southern cities of China. J Inf Secur. 2007;55(2):194–200. - PubMed