Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2021 Mar 23;23(3):e24925.
doi: 10.2196/24925.

Short-Range Forecasting of COVID-19 During Early Onset at County, Health District, and State Geographic Levels Using Seven Methods: Comparative Forecasting Study

Affiliations
Comparative Study

Short-Range Forecasting of COVID-19 During Early Onset at County, Health District, and State Geographic Levels Using Seven Methods: Comparative Forecasting Study

Christopher J Lynch et al. J Med Internet Res. .

Abstract

Background: Forecasting methods rely on trends and averages of prior observations to forecast COVID-19 case counts. COVID-19 forecasts have received much media attention, and numerous platforms have been created to inform the public. However, forecasting effectiveness varies by geographic scope and is affected by changing assumptions in behaviors and preventative measures in response to the pandemic. Due to time requirements for developing a COVID-19 vaccine, evidence is needed to inform short-term forecasting method selection at county, health district, and state levels.

Objective: COVID-19 forecasts keep the public informed and contribute to public policy. As such, proper understanding of forecasting purposes and outcomes is needed to advance knowledge of health statistics for policy makers and the public. Using publicly available real-time data provided online, we aimed to evaluate the performance of seven forecasting methods utilized to forecast cumulative COVID-19 case counts. Forecasts were evaluated based on how well they forecast 1, 3, and 7 days forward when utilizing 1-, 3-, 7-, or all prior-day cumulative case counts during early virus onset. This study provides an objective evaluation of the forecasting methods to identify forecasting model assumptions that contribute to lower error in forecasting COVID-19 cumulative case growth. This information benefits professionals, decision makers, and the public relying on the data provided by short-term case count estimates at varied geographic levels.

Methods: We created 1-, 3-, and 7-day forecasts at the county, health district, and state levels using (1) a naïve approach, (2) Holt-Winters (HW) exponential smoothing, (3) a growth rate approach, (4) a moving average (MA) approach, (5) an autoregressive (AR) approach, (6) an autoregressive moving average (ARMA) approach, and (7) an autoregressive integrated moving average (ARIMA) approach. Forecasts relied on Virginia's 3464 historical county-level cumulative case counts from March 7 to April 22, 2020, as reported by The New York Times. Statistically significant results were identified using 95% CIs of median absolute error (MdAE) and median absolute percentage error (MdAPE) metrics of the resulting 216,698 forecasts.

Results: The next-day MA forecast with 3-day look-back length obtained the lowest MdAE (median 0.67, 95% CI 0.49-0.84, P<.001) and statistically significantly differed from 39 out of 59 alternatives (66%) to 53 out of 59 alternatives (90%) at each geographic level at a significance level of .01. For short-range forecasting, methods assuming stationary means of prior days' counts outperformed methods with assumptions of weak stationarity or nonstationarity means. MdAPE results revealed statistically significant differences across geographic levels.

Conclusions: For short-range COVID-19 cumulative case count forecasting at the county, health district, and state levels during early onset, the following were found: (1) the MA method was effective for forecasting 1-, 3-, and 7-day cumulative case counts; (2) exponential growth was not the best representation of case growth during early virus onset when the public was aware of the virus; and (3) geographic resolution was a factor in the selection of forecasting methods.

Keywords: COVID-19; coronavirus disease 2019; emerging outbreak; forecasting; infectious disease; modeling and simulation; modeling disease outbreaks; public health.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Experimental design and data overview at the county, health district, and state levels. The generation and aggregation of county-level forecasts are shown on the left path (red), health district–level forecasts on the middle path (blue), and state-level forecasts on the right path (green). The information on the right provides additional detail on each stage in the experimental design. AR: autoregressive; ARIMA: autoregressive integrated moving average; ARMA: autoregressive moving average; HW: Holt-Winters; MA: moving average; MdAE: median absolute error; MdAPE: median absolute percentage error; VA: Virginia.
Figure 2
Figure 2
County-level forecasts’ aggregated median MdAE values and 95% CI. CI ranges are calculated using box plot notch ranges around the median. Statistically significant differences at a P value of .01 are identified by nonoverlapping CI ranges of forecasting methods at each combination of forecast length and look-back length. Units are in terms of COVID-19 cumulative case counts. Y-axis scales differ on each row based on the scale of the contained data. Due to differing assumptions, five of the seven forecasting methods are present for each look-back length as indicated on the x-axis. AR: autoregressive; ARIMA: autoregressive integrated moving average; ARMA: autoregressive moving average; HW: Holt-Winters; MA: moving average; MdAE: median absolute error.
Figure 3
Figure 3
Health district–level forecasts’ aggregated median MdAE values and 95% CI. CI ranges are calculated using box plot notch ranges around the median. Statistically significant differences at a P value of .01 are identified by nonoverlapping CI ranges of forecasting methods at each combination of forecast length and look-back length. Units are in terms of COVID-19 cumulative case counts. Y-axis scales differ on each row based on the scale of the contained data. Due to differing assumptions, five of the seven forecasting methods are present for each look-back length as indicated on the x-axis. AR: autoregressive; ARIMA: autoregressive integrated moving average; ARMA: autoregressive moving average; HW: Holt-Winters; MA: moving average; MdAE: median absolute error.
Figure 4
Figure 4
State-level forecasts’ aggregated median MdAE values and 95% CI. CI ranges are calculated using box plot notch ranges around the median. Statistically significant differences at a P value of .01 are identified by nonoverlapping CI ranges of forecasting methods at each combination of forecast length and look-back length. Units are in terms of COVID-19 cumulative case counts. Y-axis scales differ on each row based on the scale of the contained data. Due to differing assumptions, five of the seven forecasting methods are present for each look-back length as indicated on the x-axis. AR: autoregressive; ARIMA: autoregressive integrated moving average; ARMA: autoregressive moving average; HW: Holt-Winters; MA: moving average; MdAE: median absolute error.
Figure 5
Figure 5
Aggregated median MdAPE values and 95% CI ranges at the county, health district (HD), and state levels differentiated by forecasting method. Comparing CI ranges for a forecast method across each geographic level reveals statistically significant differences in median values for the forecasting method due to geographic scale. Nonoverlapping CI ranges indicate statistically significant differences at a P value of .01. MdAPE provides a comparison within each forecast method separately, not a comparison across different methods. AR: autoregressive; ARIMA: autoregressive integrated moving average; ARMA: autoregressive moving average; HW: Holt-Winters; MA: moving average; MdAPE: median absolute percentage error.

Similar articles

Cited by

References

    1. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, Ren R, Leung KSM, Lau EHY, Wong JY, Xing X, Xiang N, Wu Y, Li C, Chen Q, Li D, Liu T, Zhao J, Liu M, Tu W, Chen C, Jin L, Yang R, Wang Q, Zhou S, Wang R, Liu H, Luo Y, Liu Y, Shao G, Li H, Tao Z, Yang Y, Deng Z, Liu B, Ma Z, Zhang Y, Shi G, Lam TTY, Wu JT, Gao GF, Cowling BJ, Yang B, Leung GM, Feng Z. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med. 2020 Mar 26;382(13):1199–1207. doi: 10.1056/NEJMoa2001316. http://europepmc.org/abstract/MED/31995857 - DOI - PMC - PubMed
    1. Wu Z, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA. 2020 Apr 07;323(13):1239–1242. doi: 10.1001/jama.2020.2648. - DOI - PubMed
    1. Roosa K, Lee Y, Luo R, Kirpich A, Rothenberg R, Hyman J, Yan P, Chowell G. Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020. Infect Dis Model. 2020;5:256–263. doi: 10.1016/j.idm.2020.02.002. https://linkinghub.elsevier.com/retrieve/pii/S2468-0427(20)30005-1 - DOI - PMC - PubMed
    1. Petropoulos F, Makridakis S. Forecasting the novel coronavirus COVID-19. PLoS One. 2020;15(3):e0231236. doi: 10.1371/journal.pone.0231236. https://dx.plos.org/10.1371/journal.pone.0231236 - DOI - DOI - PMC - PubMed
    1. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020 May;20(5):533–534. doi: 10.1016/S1473-3099(20)30120-1. http://europepmc.org/abstract/MED/32087114 - DOI - PMC - PubMed

Publication types