This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2020 Nov 19:2020.07.13.20151233.

doi: 10.1101/2020.07.13.20151233.

Predictive performance of international COVID-19 mortality forecasting models

Joseph Friedman, Patrick Liu, Christopher E Troeger, Austin Carter, Robert C Reiner, Ryan M Barber, James Collins, Stephen S Lim, David M Pigott, Theo Vos, Simon I Hay, Christopher J L Murray, Emmanuela Gakidou

PMID: 33236023
PMCID: PMC7685335
DOI: 10.1101/2020.07.13.20151233

Predictive performance of international COVID-19 mortality forecasting models

Joseph Friedman et al. medRxiv. 2020.

[Preprint]. 2020 Nov 19:2020.07.13.20151233.

doi: 10.1101/2020.07.13.20151233.

Authors

PMID: 33236023
PMCID: PMC7685335
DOI: 10.1101/2020.07.13.20151233

Update in

Predictive performance of international COVID-19 mortality forecasting models.
Friedman J, Liu P, Troeger CE, Carter A, Reiner RC Jr, Barber RM, Collins J, Lim SS, Pigott DM, Vos T, Hay SI, Murray CJL, Gakidou E. Friedman J, et al. Nat Commun. 2021 May 10;12(1):2609. doi: 10.1038/s41467-021-22457-w. Nat Commun. 2021. PMID: 33972512 Free PMC article.

Abstract

Forecasts and alternative scenarios of COVID-19 mortality have been critical inputs into a range of policies and decision-makers need information about predictive performance. We identified n=386 public COVID-19 forecasting models and included n=8 that were global in scope and provided public, date-versioned forecasts. For each, we examined the median absolute percent error (MAPE) compared to subsequently observed mortality trends, stratified by weeks of extrapolation, world region, and month of model estimation. Models were also assessed for ability to predict the timing of peak daily mortality. The MAPE among models released in July rose from 1.8% at one week of extrapolation to 24.6% at twelve weeks. The MAPE at six weeks were the highest in Sub-Saharan Africa (34.8%), and the lowest in high-income countries (6.3%). At the global level, several models had about 10% MAPE at six weeks, showing surprisingly good performance despite the complexities of modelling human behavioural responses and government interventions. The framework and publicly available codebase presented here ( https://github.com/pyliu47/covidcompare ) can be routinely used to compare predictions and evaluate predictive performance in an ongoing fashion.

PubMed Disclaimer

Conflict of interest statement

Competing Interests

The authors declare they have no competing interests as defined by Nature Research that might be perceived to influence the results and/or discussion reported in this manuscript.

Figures

**Figure 1.. Cumulative Mortality Forecasts and Prediction Errors by Model - Example for United States**
The most recent version of each model is shown on the top left. The middle row shows all iterations of each model as separate lines, with the intensity of color indicating model date (darker models are more recent). The vertical dashed lines indicate the first and last model release date for each model. The bottom row shows all errors calculated at weekly intervals. The top right panel summarizes all observed errors, using median error and median absolute error, by weeks of forecasting, and month of model estimation. Errors incorporate an intercept shift to account for differences in each model’s input data. This figure represents an example for the United States of country-specific plots made for all locations examined in this study. Graphs for all geographies can be found in the supplement. Note that while certain model uses different input data source than the other modelling groups causing apparently discordant past trends in the top left panel. We plot raw estimates on the top left panel, however we implement an intercept shift to account for this issue in the calculation of errors.

**Figure 2.. Illustration of Analytical Framework**
This figure highlights the analytical framework presented in the main text. Part A highlights the “most current” approach, which is used to select the data shown in Figure 3. Part B highlights the “month stratified” approach used for Figures 4 and 5. The Y axis shows the number of weeks of extrapolation for each scenario, while the x axis shows a range of model date—the date on which a model was released. The thick band in each plot highlights the 4-week window of model dates used for each extrapolation week value. The thin line shows the period for which each set of models is extrapolating before errors are calculated. In the top panel, the most recent four weeks of model dates are used for each extrapolation length. Therefore, for 1-week errors models from October were used, whereas for 12-week errors, models from July and August were used. In the bottom panel, models from July are used in all cases. The analytic strategy highlighted in the top panel provides the most recent evidence possible for each extrapolation length. The strategy in the bottom allows for more reliable assessment of how errors grow with increased extrapolation time.

**Figure 3.. Most Current - Cumulative Mortality Accuracy - Median Absolute Percent Error**
Median absolute percent error values, a measure of accuracy, were calculated across all observed errors at weekly intervals, for each model by weeks of forecasting and geographic region. Values that represent fewer than five locations are masked due to small sample size. Models were included in the global average when they included at least five locations in each region. Pooled summary statistics reflect values calculated across all errors from all models, in order to comment on aggregate trends by time or geography. Results are shown here for the most recent four week window allowing for the calculation of errors at each point of extrapolation (see Figure 2 and methods). Results from other months are shown in the supplement.

**Figure 4.. Month Stratified July Models - Cumulative Mortality Bias - Median Percent Error**
Median percent error values, a measure of bias, were calculated across all observed errors at weekly intervals, for each model, by weeks of forecasting and geographic region. Values that represent fewer than five locations are masked due to small sample size. Models were included in the global average when they included at least five locations in each region. Pooled summary statistics reflect values calculated across all errors from all models, in order to comment on aggregate trends by time or geography. Results are shown here for models released in July, and results from other months are shown in the appendix.

**Figure 5.. Month Stratified July Models - Cumulative Mortality Accuracy - Median Absolute Percent Error**
Median absolute percent error values, a measure of accuracy, were calculated across all observed errors at weekly intervals, for each model by weeks of forecasting and geographic region. Values that represent fewer than five locations are masked due to small sample size. Models were included in the global average when they included at least five locations in each region. Pooled summary statistics reflect values calculated across all errors from all models, in order to comment on aggregate trends by time or geography. Results are shown here for models released in July, and results from other months are shown in the supplement.

**Figure 6.. Observed vs Predicted Peak in Daily Deaths - Example for Massachusetts**
Observed daily deaths, smoothed using a loess smoother, are shown as black-outlined dots (top). The observed peak in daily deaths is shown with a vertical black line (bottom). Each model version that was released at least one week prior to the observed peak is plotted (top) and its estimated peak is shown with a point (top and bottom). Estimated peaks are shown in the bottom panel with respect to their predicted peak date (x-axis) and model date (y-axis). Values are shown for the Massachusetts, and similar graphs for all other locations are available in the appendix. Massachusetts was chosen as the example location as the United States (used as the example for Figure 1) peaked earlier, only allowing for two models to provide peak timing errors, whereas Massachusetts peaked later, allowing for four models, making for a more illustrative example.

**Figure 7.. Peak Timing Accuracy - Median Absolute Error in Days**
Median absolute error in days is shown by model and number of weeks of forecasting. Models that are not available for at least 40 peak timing predictions are not shown. Errors only reflect models released at least seven days before the observed peak in daily mortality. One week of forecasting refers to errors occurring from seven to 13 days in advance of the observed peak, while two weeks refers to those occurring from 14 to 20 days prior, and so on, up to six weeks, which refers to 42–48 days prior. Errors are pooled across month of estimation, as we found little evidence of change in peak timing performance by month (see appendix).

See this image and copyright information in PMC

References

1. Team IC-19 health service utilization forecasting, Murray CJ. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months. medRxiv 2020; : 2020.03.27.20043752.
1. Lu FS, Nguyen AT, Link NB, Lipsitch M, Santillana M. Estimating the Early Outbreak Cumulative Incidence of COVID-19 in the United States: Three Complementary Approaches. medRxiv 2020; : 2020.04.18.20070821. - PMC - PubMed
1. Weinberger D, Cohen T, Crawford F, et al. Estimating the early death toll of COVID-19 in the United States. medRxiv 2020; : 2020.04.15.20066431.
1. Epidemic Model Guided Machine Learning for COVID-19 Forecasts in the United States | medRxiv. https://www.medrxiv.org/content/10.1101/2020.05.24.20111989v1 (accessed June 23, 2020). - DOI
1. Critical Supply Shortages — The Need for Ventilators and Personal Protective Equipment during the Covid-19 Pandemic | NEJM. New England Journal of Medicine http://www.nejm.org/doi/full/10.1056/NEJMp2006141 (accessed July 26, 2020). - DOI - PubMed

Publication types

Actions

Grants and funding

T32 GM008042/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Predictive performance of international COVID-19 mortality forecasting models

Predictive performance of international COVID-19 mortality forecasting models

Authors

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources