Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 10;22(1):833.
doi: 10.1186/s12879-022-07794-5.

Chimeric forecasting: combining probabilistic predictions from computational models and human judgment

Affiliations

Chimeric forecasting: combining probabilistic predictions from computational models and human judgment

Thomas McAndrew et al. BMC Infect Dis. .

Abstract

Forecasts of the trajectory of an infectious agent can help guide public health decision making. A traditional approach to forecasting fits a computational model to structured data and generates a predictive distribution. However, human judgment has access to the same data as computational models plus experience, intuition, and subjective data. We propose a chimeric ensemble-a combination of computational and human judgment forecasts-as a novel approach to predicting the trajectory of an infectious agent. Each month from January, 2021 to June, 2021 we asked two generalist crowds, using the same criteria as the COVID-19 Forecast Hub, to submit a predictive distribution over incident cases and deaths at the US national level either two or three weeks into the future and combined these human judgment forecasts with forecasts from computational models submitted to the COVID-19 Forecasthub into a chimeric ensemble. We find a chimeric ensemble compared to an ensemble including only computational models improves predictions of incident cases and shows similar performance for predictions of incident deaths. A chimeric ensemble is a flexible, supportive public health tool and shows promising results for predictions of the spread of an infectious agent.

PubMed Disclaimer

Conflict of interest statement

Authors declare no competing interests.

Figures

Fig. 1
Fig. 1
A A timeline of the six surveys that collected human judgment predictions from January to June of 2021, showing when surveys were open and closed (blue dashed lines), when computational predictions submitted to the COVID-19 Forecast Hub were due (black dashed line), human judgment predictions excluded in formal analysis (dark blue), for what week each forecast was made (red dash line), and the reported number of weekly incident COVID-19 cases at the US national level (black solid line). B Forecasts of weekly incident cases submitted to the COVID-19 Forecast Hub (orange) were formatted as seven quantiles, and we similarly formatted human judgment predictions from Metaculus (blue) and Good Judgment Open (red). C Forecasts of weekly incident deaths submitted to the COVID-19 Forecast Hub were formatted as twenty three quantiles and we formatted human judgment predictions the same. We collected more than 3000 original and revised human judgement predictions of incident cases and deaths of the spread of SARS-CoV-2 and burden of COVID-19 in the US
Fig. 2
Fig. 2
A Forecasts of weekly incident cases at the national level by an ensemble of computational models (blue) and ensemble of human judgement (red). The dot represents the median forecast and the shaded bars represent the 25th and 75th, and the 2.5th and 97.5th prediction intervals. B A mean and 95% confidence interval of the weighted interval score (WIS) for forecasts of incident cases made by individual computational and human judgement models. C Forecasts of weekly incident deaths and forecasts from computational models and human judgement. D Mean and 95% confidence intervals of the WIS for individual predictions of incident deaths. Though individual human judgement forecasts tend to perform worse than computational models, a human judgement ensemble performed similar to an ensemble of computational models for predictions of both cases and deaths over a 6 month period
Fig. 3
Fig. 3
Submitted and missing forecasts made by A computational forecasts, B human judgment forecasts submitted before the COVID-19 deadline, and C human judgment forecasts submitted by the survey deadline. Forecasts that were submitted are shown in blue and forecasts not submitted (missing) are shown in yellow. Rows represent a single model and columns are broken into six pairs—the left column (with the tick mark) corresponds to submissions of incident cases and the second column in the pair corresponds to submissions of incident deaths—which represent the six surveys from January 2021 to June 2021. The high proportion of missing forecasts made by human judgement models presents a methodological challenge when building a chimeric ensemble
Fig. 4
Fig. 4
Mean difference in WIS for incident cases (A) and deaths (B) at the US national level between a chimeric ensemble and a computational ensemble paired across six different surveys from Jan 2021 to June 2021 for two strategies to impute missing values (“spotty memory” and “defer to the crowd”) and, within each strategy, 5 different techniques to impute missing forecasts. A chimeric ensemble—a combination of computational and human judgment models—improves WIS scores when the target is cases but weakens or maintains similar WIS scores when the target is deaths. There are negligible differences in mean WIS between a “defer to the crowd” and “spotty memory” imputation strategy for prediction of cases and a defer to the crowd approach appears to improve predictions compared to a spotty memory approach for predictions of incident deaths. Bayesian Ridge Regression (BR) and Median imputation (MI) are promising strategies to impute missing forecasts for incident cases
Fig. 5
Fig. 5
Median, 25th and 75th, and interquartile ranges for the difference between WIS scores when fitting a performance based ensemble (PB) and equally weighted ensemble (EW) paired by survey for three different ensembles: an ensemble that includes only computational models (blue), only human judgment (red), and a chimeric ensemble that includes both computational and human judgement models (gold). A “spotty memory” strategy was used along with five imputation techniques for training. Ensemble predictions are stratified by A  incident cases and B deaths. For the majority of imputation techniques used for predictions of incident cases, training a performance based ensemble shows similar results for a chimeric, computational, and human judgement ensemble. For deaths, performance based training improves predictions of a computational ensemble, shows little improvement to a chimeric ensemble, and weakens predictions of a human judgment ensemble
Fig. 6
Fig. 6
Median, 25th and 75th, and interquartile ranges for the difference between WIS scores when fitting a performance based ensemble (PB) and equally weighted ensemble (EW) paired by survey for three different ensembles: an ensemble that includes only computational models (blue), only human judgment (red), and a chimeric ensemble that includes both computational and human judgement models (gold). A “defer to the crowd” strategy was used along with five imputation techniques for training. Ensemble predictions are stratified by A  incident cases and B deaths. For the majority of imputation techniques used for predictions of incident cases, training a performance based ensemble improves the WIS score of a human judgement ensemble and weakens the performance of a computational and chimeric ensemble. For deaths, performance based training improves predictions of a a chimeric and human judgement ensemble, but for some imputation techniques weakens predictions of a computational ensemble. An algorithm that assigns different weights based on past performance, coupled with a “defer to the crowd” imputation strategy, may improve predictive performance of a chimeric ensemble
Fig. 7
Fig. 7
WIS scores for predictions of A incident cases and B incident deaths for a performance weighted computational ensemble (blue circle), human judgement ensemble (red square), and chimeric ensemble (yellow triangle) over all imputation techniques for a “defer to the crowd” imputation strategy. The mean WIS and 95% confidence interval over all imputation techniques is plotted. For incident cases, the predictive performance for a chimeric ensemble is similar to or improved when compared to a computational ensemble and despite poorer performance from human judgement alone. For incident deaths, though a computational ensemble has improved performance a chimeric ensemble outperforms a computational ensemble on two surveys and again is able to leverage human judgement to make improved forecasts

Similar articles

  • Human judgement forecasting of COVID-19 in the UK.
    Bosse NI, Abbott S, Bracher J, van Leeuwen E, Cori A, Funk S. Bosse NI, et al. Wellcome Open Res. 2024 Mar 21;8:416. doi: 10.12688/wellcomeopenres.19380.2. eCollection 2023. Wellcome Open Res. 2024. PMID: 38618198 Free PMC article.
  • Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations.
    Sherratt K, Gruson H, Grah R, Johnson H, Niehus R, Prasse B, Sandmann F, Deuschel J, Wolffram D, Abbott S, Ullrich A, Gibson G, Ray EL, Reich NG, Sheldon D, Wang Y, Wattanachit N, Wang L, Trnka J, Obozinski G, Sun T, Thanou D, Pottier L, Krymova E, Meinke JH, Barbarossa MV, Leithauser N, Mohring J, Schneider J, Wlazlo J, Fuhrmann J, Lange B, Rodiah I, Baccam P, Gurung H, Stage S, Suchoski B, Budzinski J, Walraven R, Villanueva I, Tucek V, Smid M, Zajicek M, Perez Alvarez C, Reina B, Bosse NI, Meakin SR, Castro L, Fairchild G, Michaud I, Osthus D, Alaimo Di Loro P, Maruotti A, Eclerova V, Kraus A, Kraus D, Pribylova L, Dimitris B, Li ML, Saksham S, Dehning J, Mohr S, Priesemann V, Redlarski G, Bejar B, Ardenghi G, Parolini N, Ziarelli G, Bock W, Heyder S, Hotz T, Singh DE, Guzman-Merino M, Aznarte JL, Morina D, Alonso S, Alvarez E, Lopez D, Prats C, Burgard JP, Rodloff A, Zimmermann T, Kuhlmann A, Zibert J, Pennoni F, Divino F, Catala M, Lovison G, Giudici P, Tarantino B, Bartolucci F, Jona Lasinio G, Mingione M, Farcomeni A, Srivastava A, Montero-Manso P, Adiga A, Hurt B, Lewis B, Marathe M, Porebski P, Venkatramanan S, Bartczuk RP, Dreger F, Gambin A, Gogolewski K, Gruziel-Slomka… See abstract for full author list ➔ Sherratt K, et al. Elife. 2023 Apr 21;12:e81916. doi: 10.7554/eLife.81916. Elife. 2023. PMID: 37083521 Free PMC article.
  • Crowdsourced Perceptions of Human Behavior to Improve Computational Forecasts of US National Incident Cases of COVID-19: Survey Study.
    Braun D, Ingram D, Ingram D, Khan B, Marsh J, McAndrew T. Braun D, et al. JMIR Public Health Surveill. 2022 Dec 30;8(12):e39336. doi: 10.2196/39336. JMIR Public Health Surveill. 2022. PMID: 36219845 Free PMC article.
  • Chimeric Forecasting: An experiment to leverage human judgment to improve forecasts of infectious disease using simulated surveillance data.
    McAndrew T, Gibson GC, Braun D, Srivastava A, Brown K. McAndrew T, et al. Epidemics. 2024 Jun;47:100756. doi: 10.1016/j.epidem.2024.100756. Epub 2024 Feb 28. Epidemics. 2024. PMID: 38452456
  • Affective forecasting and psychopathology: A scoping review.
    Rizeq J. Rizeq J. Clin Psychol Rev. 2024 Mar;108:102392. doi: 10.1016/j.cpr.2024.102392. Epub 2024 Jan 14. Clin Psychol Rev. 2024. PMID: 38244480

Cited by

  • Early human judgment forecasts of human monkeypox, May 2022.
    McAndrew T, Majumder MS, Lover AA, Venkatramanan S, Bocchini P, Besiroglu T, Codi A, Braun D, Dempsey G, Abbott S, Chevalier S, Bosse NI, Cambeiro J. McAndrew T, et al. Lancet Digit Health. 2022 Aug;4(8):e569-e571. doi: 10.1016/S2589-7500(22)00127-3. Epub 2022 Jul 7. Lancet Digit Health. 2022. PMID: 35811294 Free PMC article. No abstract available.
  • Human judgement forecasting of COVID-19 in the UK.
    Bosse NI, Abbott S, Bracher J, van Leeuwen E, Cori A, Funk S. Bosse NI, et al. Wellcome Open Res. 2024 Mar 21;8:416. doi: 10.12688/wellcomeopenres.19380.2. eCollection 2023. Wellcome Open Res. 2024. PMID: 38618198 Free PMC article.

References

    1. Lutz CS, Huynh MP, Schroeder M, Anyatonwu S, Dahlgren FS, Danyluk G, Fernandez D, Greene SK, Kipshidze N, Liu L, et al. Applying infectious disease forecasting to public health: a path forward using influenza forecasting examples. BMC Public Health. 2019;19(1):1–12. - PMC - PubMed
    1. Matthew B, Slayton RB, Johansson MA, Butler JC . Improving pandemic response: employing mathematical modeling to confront coronavirus disease 2019. Clin Infect Dis. 2021. - PMC - PubMed
    1. Matthew B, Cowling BJ, Cucunubá ZM, Dinh L, Ferguson NM, Gao H, Hill V, Imai N, Johansson MA, Kada S, et al. Early insights from statistical and mathematical modeling of key epidemiologic parameters of COVID-19. Emerg Infect Dis. 2020;26(11). - PMC - PubMed
    1. Hufnagel L, Brockmann D, Geisel T. Forecast and control of epidemics in a globalized world. Proc Natl Acad Sci. 2004;101(42):15124–15129. - PMC - PubMed
    1. Petropoulos F, Makridakis S. Forecasting the novel coronavirus COVID-19. PLoS ONE. 2020;15(3):e0231236. - PMC - PubMed