. 2025 Jun 23;21(6):e1013203.

doi: 10.1371/journal.pcbi.1013203. eCollection 2025 Jun.

Synthetic method of analogues for emerging infectious disease forecasting

Alexander C Murph¹, G Casey Gibson¹, Elizabeth B Amona², Lauren J Beesley¹, Lauren A Castro³, Sara Y Del Valle³, Dave Osthus¹

Affiliations

¹ Statistical Sciences, Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.
² Department of Statistical Sciences & Operations Research, Virginia Commonwealth University, Richmond, Virginia, United States of America.
³ Information Systems & Modeling, Analytics, Intelligence, & Technology Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

PMID: 40549797
PMCID: PMC12303386
DOI: 10.1371/journal.pcbi.1013203

Synthetic method of analogues for emerging infectious disease forecasting

Alexander C Murph et al. PLoS Comput Biol. 2025.

. 2025 Jun 23;21(6):e1013203.

doi: 10.1371/journal.pcbi.1013203. eCollection 2025 Jun.

Authors

Alexander C Murph¹, G Casey Gibson¹, Elizabeth B Amona², Lauren J Beesley¹, Lauren A Castro³, Sara Y Del Valle³, Dave Osthus¹

Affiliations

¹ Statistical Sciences, Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.
² Department of Statistical Sciences & Operations Research, Virginia Commonwealth University, Richmond, Virginia, United States of America.
³ Information Systems & Modeling, Analytics, Intelligence, & Technology Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

PMID: 40549797
PMCID: PMC12303386
DOI: 10.1371/journal.pcbi.1013203

Abstract

The Method of Analogues (MOA) has gained popularity in the past decade for infectious disease forecasting due to its non-parametric nature. In MOA, the local behavior observed in a time series is matched to the local behaviors of several historical time series. The known values that directly follow the historical time series that best match the observed time series are used to calculate a forecast. This non-parametric approach leverages historical trends to produce forecasts without extensive parameterization, making it highly adaptable. However, MOA is limited in scenarios where historical data is sparse. This limitation was particularly evident during the early stages of the COVID-19 pandemic, where the emerging global epidemic had little-to-no historical data. In this work, we propose a new method inspired by MOA, called the Synthetic Method of Analogues (sMOA). sMOA replaces historical disease data with a library of synthetic data that describe a broad range of possible disease trends. This model circumvents the need to estimate explicit parameter values by instead matching segments of ongoing time series data to a comprehensive library of synthetically generated segments of time series data. We demonstrate that sMOA has competitive performance with state-of-the-art infectious disease forecasting models, out-performing 78% of models from the COVID-19 Forecasting Hub in terms of averaged Mean Absolute Error and 76% of models from the COVID-19 Forecasting Hub in terms of averaged Weighted Interval Score. Additionally, we introduce a novel uncertainty quantification methodology designed for the onset of emerging epidemics. Developing versatile approaches that do not rely on historical data and can maintain high accuracy in the face of novel pandemics is critical for enhancing public health decision-making and strengthening preparedness for future outbreaks.

Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Diagram of sMOA.**
Recall k is the time series segment length and h is the largest forecast horizon. (a) Three fully observed synthetic time series ${\tilde{y}}_{i}^{ℒ}$ in the library. (b) Synthetic time series segments $y_{i}^{ℒ}$ of length $k + h$ . The first k time points in black; the last h time points in red. (c) Fully observed time series ${\tilde{y}}^{𝒪}$ . (d) Time series segment $y^{𝒪}$ of length k (i.e., the last k observations from the time series in (c)). (e) Compute the distance $d_{i} = d (y^{𝒪}, y_{i, 1 : k}^{ℒ})$ between the observed time series segment $y^{𝒪}$ and the first k observations of each synthetic time series segment $y_{i}^{ℒ}$ in the library (i.e., the black points). (f) The point forecast is an aggregation (e.g., average) of the last h observations of the synthetic time series (i.e., the red points) with the smallest distances d_i.

**Fig 2. A demonstration of sMOA forecasting during the early weeks of the COVID-19 epidemic.**
Black lines correspond to point forecasts; the orange lines correspond to the true observed value. The basic ensemble model of the ForecastHub (‘COVIDhub-4_week_ensemble’) and the basic persistence model (‘COVIDhub-baseline’) forecasts are provided for reference for the dates where forecasts were provided. The third model used for later comparisons, the ‘COVIDHub-trained_ensemble’, does not provide forecasts this early in the COVID-19 epidemic.

**Fig 3. Nominal vs. empirical coverage for sMOA over every state in the US and over four forecast horizons (1w, 2w, 3w, 4w), plotted using a black line.**
The dotted line indicates a perfect match between nominal and empirical coverages for reference. Over every forecast made for the data application to COVID-19, nominal and empirical coverages approximately match.

**Fig 4. Direct comparisons between models from the ForecastHub and sMOA, using mean MAE (left) and mean WIS (right).**
The error comparison between sMOA and a given model from the ForecastHub is only calculated for the dates for which forecasts from the given model were reported. That is, a given point represents the mean error metric for a model from the ForecastHub calculated over every date, state, and forecast horizon available for that model, plotted against the same mean metric calculated using sMOA on these same dates, states, and forecast horizons. Models beneath the diagonal black line were outperformed by sMOA. Four outlier models were removed for ease of visualization.

**Fig 5. The proportion of all models (black) and best-in-class models (red) sMOA outperforms in MAE (top) and WIS (bottom) if the validation window ranged from August 2020 through the x-axis date.**
sMOA outperforms the majority of all models and best-in-class models if the validation date cut off is between October 2020 and March 2023. Directly before October 2020, there was a dip in incidence case counts that sMOA failed to forecast accurately that caused the initial lower performance.

See this image and copyright information in PMC

References

1. Brooks LC, Farrow DC, Hyun S, Tibshirani RJ, Rosenfeld R. Flexible modeling of epidemics with an empirical Bayes framework. PLoS Comput Biol. 2015;11(8):e1004382. doi: 10.1371/journal.pcbi.1004382 - DOI - PMC - PubMed
1. Viboud C, Boëlle P-Y, Carrat F, Valleron A-J, Flahault A. Prediction of the spread of influenza epidemics by the method of analogues. Am J Epidemiol. 2003;158(10):996–1006. doi: 10.1093/aje/kwg239 - DOI - PubMed
1. Moniz L, Buczak AL, Baugher B, Guven E, Chretien J-P. Predicting influenza with dynamical methods. BMC Med Inform Decis Mak. 2016;16(1):134. doi: 10.1186/s12911-016-0371-7 - DOI - PMC - PubMed
1. Amnatsan S, Yoshikawa S, Kanae S. Improved forecasting of extreme monthly reservoir inflow using an analogue-based forecasting method: a case study of the Sirikit Dam in Thailand. Water. 2018;10(11):1614. doi: 10.3390/w10111614 - DOI
1. Simpson GL. Analogue methods in palaeoecology: using the analogue package. J Stat Soft. 2007;22(2). doi: 10.18637/jss.v022.i02 - DOI

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 GM130668/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Synthetic method of analogues for emerging infectious disease forecasting

Affiliations

Synthetic method of analogues for emerging infectious disease forecasting

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous