Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 14;21(1):34.
doi: 10.1186/s12874-021-01226-9.

Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks

Affiliations

Ensemble bootstrap methodology for forecasting dynamic growth processes using differential equations: application to epidemic outbreaks

Gerardo Chowell et al. BMC Med Res Methodol. .

Abstract

Background: Ensemble modeling aims to boost the forecasting performance by systematically integrating the predictive accuracy across individual models. Here we introduce a simple-yet-powerful ensemble methodology for forecasting the trajectory of dynamic growth processes that are defined by a system of non-linear differential equations with applications to infectious disease spread.

Methods: We propose and assess the performance of two ensemble modeling schemes with different parametric bootstrapping procedures for trajectory forecasting and uncertainty quantification. Specifically, we conduct sequential probabilistic forecasts to evaluate their forecasting performance using simple dynamical growth models with good track records including the Richards model, the generalized-logistic growth model, and the Gompertz model. We first test and verify the functionality of the method using simulated data from phenomenological models and a mechanistic transmission model. Next, the performance of the method is demonstrated using a diversity of epidemic datasets including scenario outbreak data of the Ebola Forecasting Challenge and real-world epidemic data outbreaks of including influenza, plague, Zika, and COVID-19.

Results: We found that the ensemble method that randomly selects a model from the set of individual models for each time point of the trajectory of the epidemic frequently outcompeted the individual models as well as an alternative ensemble method based on the weighted combination of the individual models and yields broader and more realistic uncertainty bounds for the trajectory envelope, achieving not only better coverage rate of the 95% prediction interval but also improved mean interval scores across a diversity of epidemic datasets.

Conclusion: Our new methodology for ensemble forecasting outcompete component models and an alternative ensemble model that differ in how the variance is evaluated for the generation of the prediction intervals of the forecasts.

Keywords: Differential equations; Generalized logistic growth model; Gompertz model; Interval score; Model ensemble, parameter estimation, uncertainty quantification, phenomenological growth; Parametric bootstrapping; Richards model.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

Fig. 1
Fig. 1
Schematic diagrams illustrate the construction of the Bootstrap samples using Ensemble Method 1 (a) and Ensemble Method 2 (b). Suppose we have I models under consideration. Given the training data, let Θ^i denote the set of estimated parameters and fitΘ^i denote the estimated mean incident curve, for the i-th model. Based on the quality of the model fit measured by the MSE or criteria such as AIC, we compute the weight wi for the i-th model, i = 1, ..., I, where ∑wi = 1. For Method 1, we generate a random variable yi from Poisson distribution with mean fenstj=i=1IwifitΘ^i to generate a bootstrap sample. In contrast, to generate the Bootstrap samples based on Method 2, we assume that at each time point the epidemic follows the i-th model with probability wi
Fig. 2
Fig. 2
Synthetic datasets for testing and demonstrating the functionality of the ensemble approaches. We simulated incidence curves from the 2-parameter Gompertz model (the “true model”) with added Poisson error structure noise (blue circles). We set parameters r = 0.4, b = 0.1086 and K = 10,000. The initial condition was set at C(0) = 1. The dashed vertical lines indicate the start and end days of the daily 20-day ahead forecasts
Fig. 3
Fig. 3
Synthetic datasets derived from a stochastic homogenous-mixing SEIR transmission model with a population size of 100,000 and time-dependent transmission rate such that the resulting incidence curves are not well-captured by any of the individual models considered in the ensemble model (GLM, RIC, GOM). These simulations have a constant reproduction number of 2.0 from day 0 to day 20, then the reproduction number declines from 2.0 to 1.0 on epidemic day 30 and then finally the reproduction number drops from 1.0 to 0.5 on epidemic day 40. The simulations start with 5 infected individuals. The dashed vertical lines indicate the start and end days of the daily 20-day ahead forecasts
Fig. 4
Fig. 4
Epidemic trajectories for eight real epidemics namely Zika in Antioquia, Colombia, the 1918 influenza pandemic in San Francisco, the 2009 A/H1N1 influenza pandemic in Manitoba, Canada, Severe Acute Respiratory Syndrome (SARS) in Singapore, plague in Madagascar, and COVID-19 epidemics in the provinces of Guangdong, Anhui, and Hunan. The dashed vertical lines indicate the start and end days of the daily 20-day ahead forecasts
Fig. 5
Fig. 5
Representative sequential 20-day ahead forecasts (top to bottom panels) obtained from individual models (GLM, RIC, GOM) and two ensemble methods applied to synthetic data derived from the GOM model. Blue circles correspond to the data points. The mean fit (solid line) and 95% prediction interval (dashed lines) are also shown. The gray shaded areas help highlight differences in the 95% prediction intervals associated with the ensemble methods. The vertical line separates the calibration period (left) from the forecasting period (right)
Fig. 6
Fig. 6
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts from the synthetic data derived from the Gompertz model. As expected, we found that the “true model” (GOM) outperformed all other models based on four performance metrics although it achieved a similar coverage rate of the 95% PI to that of the Ensemble Method 2, which was close to 0.95. While the performance of the ensemble methods was not different in terms of the MAE and MSE, Ensemble Method 2 achieved significantly better coverage rate of the 95% PI and lower MIS compared to the Ensemble Method 1
Fig. 7
Fig. 7
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the 2009 A/H1N1 influenza pandemic in Manitoba, Canada. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS albeit predictions were a little away from the actual future values and individual models often attained lower MAE or MSE
Fig. 8
Fig. 8
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the 1918 influenza pandemic in San Francisco. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS
Fig. 9
Fig. 9
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the plague epidemic in Madagascar. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS
Fig. 10
Fig. 10
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the SARS outbreak in Singapore. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS
Fig. 11
Fig. 11
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the COVID-19 epidemic in Guangdong. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS
Fig. 12
Fig. 12
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the COVID-19 epidemic in Henan. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS
Fig. 13
Fig. 13
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the COVID-19 epidemic in Hunan. The Ensemble Method 2 outperformed all other models based on the coverage rate of the 95% PI and the MIS
Fig. 14
Fig. 14
Mean performance of the individual and ensemble models in 1–20 day ahead forecasts for the 2016 Zika epidemic in Antioquia, Colombia. The GLM yields best forecasting performance in terms of the coverage rate and the MIS, but it does not achieve great advantage over the Ensemble Method 2

Similar articles

Cited by

References

    1. Del Valle SY, McMahon BH, Asher J, Hatchett R, Lega JC, Brown HE, Leany ME, Pantazis Y, Roberts DJ, Moore S, et al. Summary results of the 2014-2015 DARPA Chikungunya challenge. BMC Infect Dis. 2018;18(1):245. doi: 10.1186/s12879-018-3124-7. - DOI - PMC - PubMed
    1. McGowan CJ, Biggerstaff M, Johansson M, Apfeldorf KM, Ben-Nun M, Brooks L, Convertino M, Erraguntla M, Farrow DC, Freeze J, et al. Collaborative efforts to forecast seasonal influenza in the United States, 2015-2016. Sci Rep. 2019;9(1):683. doi: 10.1038/s41598-018-36361-9. - DOI - PMC - PubMed
    1. Johansson MA, Apfeldorf KM, Dobson S, Devita J, Buczak AL, Baugher B, Moniz LJ, Bagley T, Babin SM, Guven E, et al. An open challenge to advance probabilistic forecasting for dengue epidemics. Proc Natl Acad Sci U S A. 2019;116(48):24268–24274. doi: 10.1073/pnas.1909865116. - DOI - PMC - PubMed
    1. Viboud C, Sun K, Gaffey R, Ajelli M, Fumanelli L, Merler S, Zhang Q, Chowell G, Simonsen L, Vespignani A, et al. The RAPIDD ebola forecasting challenge: synthesis and lessons learnt. Epidemics. 2018;22:13–21. doi: 10.1016/j.epidem.2017.08.002. - DOI - PMC - PubMed
    1. Chretien JP, Riley S, George DB. Mathematical modeling of the West Africa Ebola epidemic. eLife. 2015;4:e09186. doi: 10.7554/eLife.09186. - DOI - PMC - PubMed

LinkOut - more resources