. 2021 Feb 14;21(1):34.

doi: 10.1186/s12874-021-01226-9.

Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks

Gerardo Chowell^{1

2}, Ruiyan Luo³

Affiliations

¹ Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA. gchowell@gsu.edu.
² Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA. gchowell@gsu.edu.
³ Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA.

PMID: 33583405
PMCID: PMC7882252
DOI: 10.1186/s12874-021-01226-9

Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks

Gerardo Chowell et al. BMC Med Res Methodol. 2021.

. 2021 Feb 14;21(1):34.

doi: 10.1186/s12874-021-01226-9.

Authors

Gerardo Chowell^{1

2}, Ruiyan Luo³

Affiliations

¹ Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA. gchowell@gsu.edu.
² Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA. gchowell@gsu.edu.
³ Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA.

PMID: 33583405
PMCID: PMC7882252
DOI: 10.1186/s12874-021-01226-9

Abstract

Background: Ensemble modeling aims to boost the forecasting performance by systematically integrating the predictive accuracy across individual models. Here we introduce a simple-yet-powerful ensemble methodology for forecasting the trajectory of dynamic growth processes that are defined by a system of non-linear differential equations with applications to infectious disease spread.

Methods: We propose and assess the performance of two ensemble modeling schemes with different parametric bootstrapping procedures for trajectory forecasting and uncertainty quantification. Specifically, we conduct sequential probabilistic forecasts to evaluate their forecasting performance using simple dynamical growth models with good track records including the Richards model, the generalized-logistic growth model, and the Gompertz model. We first test and verify the functionality of the method using simulated data from phenomenological models and a mechanistic transmission model. Next, the performance of the method is demonstrated using a diversity of epidemic datasets including scenario outbreak data of the Ebola Forecasting Challenge and real-world epidemic data outbreaks of including influenza, plague, Zika, and COVID-19.

Results: We found that the ensemble method that randomly selects a model from the set of individual models for each time point of the trajectory of the epidemic frequently outcompeted the individual models as well as an alternative ensemble method based on the weighted combination of the individual models and yields broader and more realistic uncertainty bounds for the trajectory envelope, achieving not only better coverage rate of the 95% prediction interval but also improved mean interval scores across a diversity of epidemic datasets.

Conclusion: Our new methodology for ensemble forecasting outcompete component models and an alternative ensemble model that differ in how the variance is evaluated for the generation of the prediction intervals of the forecasts.

Keywords: Differential equations; Generalized logistic growth model; Gompertz model; Interval score; Model ensemble, parameter estimation, uncertainty quantification, phenomenological growth; Parametric bootstrapping; Richards model.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

**Fig. 1**
Schematic diagrams illustrate the construction of the Bootstrap samples using Ensemble Method 1 (a) and Ensemble Method 2 (b). Suppose we have I models under consideration. Given the training data, let ${\hat{Θ}}_{i}$ denote the set of estimated parameters and $f_{i} (t, {\hat{Θ}}_{i})$ denote the estimated mean incident curve, for the i-th model. Based on the quality of the model fit measured by the MSE or criteria such as AIC, we compute the weight w_i for the i-th model, i = 1, ..., I, where ∑w_i = 1. For Method 1, we generate a random variable y_i from Poisson distribution with mean $f_{ens} (t_{j}) = \sum_{i = 1}^{I} w_{i} f_{i} (t, {\hat{Θ}}_{i})$ to generate a bootstrap sample. In contrast, to generate the Bootstrap samples based on Method 2, we assume that at each time point the epidemic follows the i-th model with probability w_i

**Fig. 2**
Synthetic datasets for testing and demonstrating the functionality of the ensemble approaches. We simulated incidence curves from the 2-parameter Gompertz model (the “true model”) with added Poisson error structure noise (blue circles). We set parameters r = 0.4, b = 0.1086 and K = 10,000. The initial condition was set at C(0) = 1. The dashed vertical lines indicate the start and end days of the daily 20-day ahead forecasts

**Fig. 3**
Synthetic datasets derived from a stochastic homogenous-mixing SEIR transmission model with a population size of 100,000 and time-dependent transmission rate such that the resulting incidence curves are not well-captured by any of the individual models considered in the ensemble model (GLM, RIC, GOM). These simulations have a constant reproduction number of 2.0 from day 0 to day 20, then the reproduction number declines from 2.0 to 1.0 on epidemic day 30 and then finally the reproduction number drops from 1.0 to 0.5 on epidemic day 40. The simulations start with 5 infected individuals. The dashed vertical lines indicate the start and end days of the daily 20-day ahead forecasts

**Fig. 4**
Epidemic trajectories for eight real epidemics namely Zika in Antioquia, Colombia, the 1918 influenza pandemic in San Francisco, the 2009 A/H1N1 influenza pandemic in Manitoba, Canada, Severe Acute Respiratory Syndrome (SARS) in Singapore, plague in Madagascar, and COVID-19 epidemics in the provinces of Guangdong, Anhui, and Hunan. The dashed vertical lines indicate the start and end days of the daily 20-day ahead forecasts

**Fig. 5**
Representative sequential 20-day ahead forecasts (top to bottom panels) obtained from individual models (GLM, RIC, GOM) and two ensemble methods applied to synthetic data derived from the GOM model. Blue circles correspond to the data points. The mean fit (solid line) and 95% prediction interval (dashed lines) are also shown. The gray shaded areas help highlight differences in the 95% prediction intervals associated with the ensemble methods. The vertical line separates the calibration period (left) from the forecasting period (right)

**Fig. 6**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts from the synthetic data derived from the Gompertz model. As expected, we found that the “true model” (GOM) outperformed all other models based on four performance metrics although it achieved a similar coverage rate of the 95% PI to that of the Ensemble Method 2, which was close to 0.95. While the performance of the ensemble methods was not different in terms of the MAE and MSE, Ensemble Method 2 achieved significantly better coverage rate of the 95% PI and lower MIS compared to the Ensemble Method 1

**Fig. 7**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the 2009 A/H1N1 influenza pandemic in Manitoba, Canada. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS albeit predictions were a little away from the actual future values and individual models often attained lower MAE or MSE

**Fig. 8**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the 1918 influenza pandemic in San Francisco. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS

**Fig. 9**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the plague epidemic in Madagascar. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS

**Fig. 10**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the SARS outbreak in Singapore. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS

**Fig. 11**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the COVID-19 epidemic in Guangdong. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS

**Fig. 12**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the COVID-19 epidemic in Henan. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS

**Fig. 13**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the COVID-19 epidemic in Hunan. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS

**Fig. 14**
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the 2016 Zika epidemic in Antioquia, Colombia. The GLM yields best forecasting performance in terms of the coverage rate and the MIS, but it does not achieve great advantage over the Ensemble Method 2

See this image and copyright information in PMC

Cited by

Prediction intervals of the COVID-19 cases by HAR models with growth rates and vaccination rates in top eight affected countries: Bootstrap improvement.
Hwang E. Hwang E. Chaos Solitons Fractals. 2022 Feb;155:111789. doi: 10.1016/j.chaos.2021.111789. Epub 2022 Jan 3. Chaos Solitons Fractals. 2022. PMID: 35002103 Free PMC article.
Machine learning techniques to predict different levels of hospital care of CoVid-19.
Hernández-Pereira E, Fontenla-Romero O, Bolón-Canedo V, Cancela-Barizo B, Guijarro-Berdiñas B, Alonso-Betanzos A. Hernández-Pereira E, et al. Appl Intell (Dordr). 2022;52(6):6413-6431. doi: 10.1007/s10489-021-02743-2. Epub 2021 Sep 10. Appl Intell (Dordr). 2022. PMID: 34764619 Free PMC article.
An ensemble n-sub-epidemic modeling framework for short-term forecasting epidemic trajectories: Application to the COVID-19 pandemic in the USA.
Chowell G, Dahal S, Tariq A, Roosa K, Hyman JM, Luo R. Chowell G, et al. PLoS Comput Biol. 2022 Oct 6;18(10):e1010602. doi: 10.1371/journal.pcbi.1010602. eCollection 2022 Oct. PLoS Comput Biol. 2022. PMID: 36201534 Free PMC article.
SubEpiPredict: A tutorial-based primer and toolbox for fitting and forecasting growth trajectories using the ensemble n-sub-epidemic modeling framework.
Chowell G, Dahal S, Bleichrodt A, Tariq A, Hyman JM, Luo R. Chowell G, et al. Infect Dis Model. 2024 Feb 9;9(2):411-436. doi: 10.1016/j.idm.2024.02.001. eCollection 2024 Jun. Infect Dis Model. 2024. PMID: 38385022 Free PMC article.
Controlling Multiple COVID-19 Epidemic Waves: An Insight from a Multi-scale Model Linking the Behaviour Change Dynamics to the Disease Transmission Dynamics.
Tang B, Zhou W, Wang X, Wu H, Xiao Y. Tang B, et al. Bull Math Biol. 2022 Aug 25;84(10):106. doi: 10.1007/s11538-022-01061-z. Bull Math Biol. 2022. PMID: 36008498 Free PMC article.

See all "Cited by" articles

References

1. Del Valle SY, McMahon BH, Asher J, Hatchett R, Lega JC, Brown HE, Leany ME, Pantazis Y, Roberts DJ, Moore S, et al. Summary results of the 2014-2015 DARPA Chikungunya challenge. BMC Infect Dis. 2018;18(1):245. doi: 10.1186/s12879-018-3124-7. - DOI - PMC - PubMed
1. McGowan CJ, Biggerstaff M, Johansson M, Apfeldorf KM, Ben-Nun M, Brooks L, Convertino M, Erraguntla M, Farrow DC, Freeze J, et al. Collaborative efforts to forecast seasonal influenza in the United States, 2015-2016. Sci Rep. 2019;9(1):683. doi: 10.1038/s41598-018-36361-9. - DOI - PMC - PubMed
1. Johansson MA, Apfeldorf KM, Dobson S, Devita J, Buczak AL, Baugher B, Moniz LJ, Bagley T, Babin SM, Guven E, et al. An open challenge to advance probabilistic forecasting for dengue epidemics. Proc Natl Acad Sci U S A. 2019;116(48):24268–24274. doi: 10.1073/pnas.1909865116. - DOI - PMC - PubMed
1. Viboud C, Sun K, Gaffey R, Ajelli M, Fumanelli L, Merler S, Zhang Q, Chowell G, Simonsen L, Vespignani A, et al. The RAPIDD ebola forecasting challenge: synthesis and lessons learnt. Epidemics. 2018;22:13–21. doi: 10.1016/j.epidem.2017.08.002. - DOI - PMC - PubMed
1. Chretien JP, Riley S, George DB. Mathematical modeling of the West Africa Ebola epidemic. eLife. 2015;4:e09186. doi: 10.7554/eLife.09186. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks

Affiliations

Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources