. 2022 Sep 20;85(4):36.

doi: 10.1007/s00285-022-01804-5.

Limits of epidemic prediction using SIR models

Omar Melikechi¹, Alexander L Young², Tao Tang³, Trevor Bowman³, David Dunson^{3

4}, James Johndrow⁵

Affiliations

¹ Department of Mathematics, Duke University, Durham, NC, USA. omar.melikechi@duke.edu.
² Department of Statistics, Harvard University, Cambridge, MA, USA.
³ Department of Mathematics, Duke University, Durham, NC, USA.
⁴ Department of Statistics, Duke University, Durham, NC, USA.
⁵ Department of Statistics, University of Pennsylvania, Philadelphia, PA, USA.

PMID: 36125562
PMCID: PMC9487859
DOI: 10.1007/s00285-022-01804-5

Limits of epidemic prediction using SIR models

Omar Melikechi et al. J Math Biol. 2022.

. 2022 Sep 20;85(4):36.

doi: 10.1007/s00285-022-01804-5.

Authors

Omar Melikechi¹, Alexander L Young², Tao Tang³, Trevor Bowman³, David Dunson^{3

4}, James Johndrow⁵

Affiliations

¹ Department of Mathematics, Duke University, Durham, NC, USA. omar.melikechi@duke.edu.
² Department of Statistics, Harvard University, Cambridge, MA, USA.
³ Department of Mathematics, Duke University, Durham, NC, USA.
⁴ Department of Statistics, Duke University, Durham, NC, USA.
⁵ Department of Statistics, University of Pennsylvania, Philadelphia, PA, USA.

PMID: 36125562
PMCID: PMC9487859
DOI: 10.1007/s00285-022-01804-5

Abstract

The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters based on noisy observations early in the outbreak, well before the epidemic reaches its peak. This allows prediction of the subsequent course of the epidemic and design of appropriate interventions. However, accurately inferring SIR model parameters in such scenarios is problematic. This article provides novel, theoretical insight on this issue of practical identifiability of the SIR model. Our theory provides new understanding of the inferential limits of routinely used epidemic models and provides a valuable addition to current simulate-and-check methods. We illustrate some practical implications through application to a real-world epidemic data set.

Keywords: Epidemic prediction; Hypothesis testing; Identifiability; Nonlinear dynamics; Parameter inference; SIR model.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
The trajectory of the SIR model used in the simulation (left). Plots of $\hat{β}$ vs $\hat{γ}$ from 1000 realizations from the sampling distribution of their MLE (right)

**Fig. 2**
Distance between perturbed and true trajectories for different parameter values and population sizes. In each graph the horizontal axis is the number of days since the start of the epidemic and the vertical axis is the distance $‖ φ_{t} (x, θ_{ϵ} (ω)) - φ_{t} (x, θ) ‖$ between a perturbed trajectory and the true trajectory at time t. The gray, green, and black curves correspond to 90 perturbed trajectories, one for each of 90 equally spaced angles $ω$ in $[0, 2 π)$ . The black curves correspond to the angles $π / 4$ and $5 π / 4$ . The green curves correspond to the remaining angles in the intervals $[π / 4 - π / 12, π / 4 + π / 12)$ and $[5 π / 4 - π / 12, 5 π / 4 + π / 12)$ , i.e. in intervals of width $π / 6$ centered at $π / 4$ and $5 π / 4$ , respectively. The gray curves correspond to those angles in $[0, 2 π)$ outside these two intervals. Note the distances corresponding to angles close to $π / 4$ and $5 π / 4$ (the green and black curves) are smaller than those distances corresponding to angles farther away from $π / 4$ and $5 π / 4$ (the gray curves), which supports the claim that the inverse problem is least practically identifiable for parameter perturbations approximately along a line of slope 1. The approximate lower bound of Approximation 1 is in red. The peak time of the trajectory corresponding to $θ$ is indicated by the vertical blue line, and 80% of it by the vertical orange line. The first through fourth columns have population sizes $10^{4}, 10^{5}, 10^{6}$ , and $10^{7}$ , respectively, with only one initial infection in each case. The perturbation sizes for the first through fourth rows are $ϵ = . 03, . 03, . 06$ , and .1, respectively. The SIR paramaters for the first through fourth rows are $(β, γ) = (. 21, . 14), (. 21, . 07), (. 42, . 07)$ , and (1.68, .14), which give respective $R_{0}$ values of 1.5, 3, 6, and 12. Note the approximate lower bound holds roughly up to 80% of the peak time in all cases despite the wide range of parameters. Finally, we remark that the two seemingly “distinct" classes of gray curves in each plot correspond to different subsets of the 90 distinct angles. This as well as the multimodality of certain curves (which becomes more apparent when our graphs are extended further beyond the peak time) are consequences of the nonlinearity of the SIR model and are not directly relevant to our analysis

**Fig. 3**
Logarithm of distance between perturbed and true trajectories for different parameter values and population sizes. Everything is the same as in Fig. 2 except now we plot $log ‖ φ_{t} (x, θ_{ϵ} (ω)) - φ_{t} (x, θ) ‖$ instead of $‖ φ_{t} (x, θ_{ϵ} (ω)) - φ_{t} (x, θ) ‖$ . This gives a better view of the approximate lower bound early in the epidemic. Note the vertical axis is now a log scale

**Fig. 4**
Type II error as a function of perturbation size and noise level. The left panel shows the empirical and theoretical type II errors for the angles $ω = 0, π / 4$ , and $π$ as a function of perturbation size $ϵ$ with fixed noise level $σ = 0.3$ . The right panel shows the empirical and theoretical type II errors for the same angles as a function of noise level $σ$ with fixed perturbation size $ϵ = . 03$ . In each case the SIR parameters are those from Sect. 2, namely $(β, γ) = (. 21, . 07)$ , $N = 10^{7}$ , and initial condition $i_{0} = 1 / N$ . The time horizon T is 60 days into the epidemic, which in this case is 60 days prior to the peak time. The significance level is $α = . 05$ . Here, *theoretical* refers to the first (red) and second (black) approximations of type II error $E_{2} (ω)$ in Approximation 2, i.e. Eqs. (8) and (9), respectively. *Empirical* refers to the type II error obtained by performing 1000 simulations of the noisy SIR model (2) followed by a likelihood ratio test of the hypothesis in (7) for each set of parameters. More specifically, the red and black curves lying over the blue line are the type II error approximations (8) and (9) when $ω = π / 4$ , those lying over the purple line are when $ω = π$ , and those lying over the green line are when $ω = 0$ , with the blue, green, and purple curves corresponding to the empirically computed type II error rates when $ω = π / 4, 0$ , and $π$ , respectively. In each case both theoretical results closely align with the empirical ones, with the first approximation being slightly better than the second as expected. Also as predicted, the empirical type II errors all approach $1 - α = . 95$ both as perturbation size goes to 0 and as the noise level gets large, and this approach is most rapid when $ω = π / 4$ . In each case the noise model is Case 2, $σ_{t} = N σ i_{t}$ . For the simulated blue, green, and purple curves, we used a numerical integrator to obtain the $i_{t}$ values, while for the red and black curves we used the pre-peak approximation $i_{t} \approx e^{δ t} i_{0}$

**Fig. 5**
Consequences of type II error. The top panels show the total number of infections 10 days past the time of peak infection as a percentage of the total population. The bottom panels show the duration of the epidemic, which is defined to be the first day past the peak when less than 10 individuals are infectious. The left panels correspond to the parameter $θ_{0} = (0.21, 0.14)$ and the right panels to $θ_{0} = (. 21, . 07)$ . The red lines give the total percent infected or duration of the epidemic for the true parameter $θ_{0}$ in each of their respective plots, while the blue and green curves give these values for $θ_{ϵ} (π / 4)$ and $θ_{ϵ} (5 π / 4)$ over a range of $ϵ$ values, respectively. In all cases $N = 10^{7}$

**Fig. 6**
Practical identifiability of $δ$ . The left and center panels use the first type II error approximation in Approximation 2 to graph type II error as a function of perturbation size, $ϵ$ , and noise level, $σ$ , respectively. Each of the rainbow colored curves in both panels correspond to one of 150 different values of $ω$ spread uniformly across $[0, 2 π)$ . The color chart in the right panel indicates the colors corresponding to different angles $ω$ : The light red curves correspond to the $ω$ closest to $π / 4$ and $5 π / 4$ , the yellow are those a bit farther away, the blue still farther, and the purple are those farthest from $π / 4$ and $5 π / 4$ , i.e. closest to $3 π / 4$ and $7 π / 4$ . Finally, the dark red curve in each of the two panels corresponds to $ω = π / 4$ and $5 π / 4$ , which have the same type II error. Note the rapid fall off in type II error as angles get farther from $π / 4$ and $5 π / 4$ , especially as a function of $ϵ$ . This agrees with the empirical observation in Fig. 1 that MLE favors parameters lying along a line of slope 1. In particular, the inverse problem for $δ$ is practically identifiable

**Fig. 7**
Going from hypothesis test (12) to (7)

**Fig. 8**
New York City public testing results for COVID-19 from the first known case on February 29, 2020 to March 15, 2020. We have used maximum likelihood estimation to generate an SIR trajectory through the noisy data for each reporting rate

**Fig. 9**
Type II error rate as a function of $ϵ$ and $σ$ at significance level $α = 0.1$ , reporting rate $p = 0.15$ , and $T = 14$ days of new infection observations

**Fig. 10**
Error analysis corresponding to Figs. 2, 3. As in Figs. 2, 3, the horizontal axis in each graph is the number of days since the start of the epidemic, the vertical blue line is the peak time, and the orange line is 80% of the peak time. The vertical axis is the logarithm of the relative error, $log | E_{t}^{ϵ} | - log ‖ φ_{t}^{ϵ} - φ_{t} ‖$ , at time t. There are 50 green curves in each plot which are the relative errors corresponding to 50 angles in the intervals $[π / 4 - π / 12, π / 4 + π / 12)$ and $[5 π / 4 - π / 12, 5 π / 4 + π / 12)$ . The red line in each plot is the average linear approximation of the 50 green curves. Also as in Figs. 2, 3, the first through fourth columns have population sizes $10^{4}, 10^{5}, 10^{6}$ , and $10^{7}$ , respectively with one initial infection in each case, and the first through fourth rows have parameters $(β, γ, ϵ) = (. 21, . 14, . 03), (. 21, . 07, . 03), (. 42, . 07, . 06)$ , and (1.68, .14, .1) which give $R_{0}$ values of 1.5, 3, 6, and 12. The main point is that for every combination of parameters the relative error is exponentially small until about 80% of the peak time at which point it becomes $O (1)$ and subsequently blows up exponentially

See this image and copyright information in PMC

Cited by

Harnessing artificial intelligence for enhanced public health surveillance: a narrative review.
Mendes VIS, Mendes BMF, Moura RP, Lourenço IM, Oliveira MFA, Ng KL, Pinto CS. Mendes VIS, et al. Front Public Health. 2025 Jul 30;13:1601151. doi: 10.3389/fpubh.2025.1601151. eCollection 2025. Front Public Health. 2025. PMID: 40809756 Free PMC article. Review.
The disutility of compartmental model forecasts during the COVID-19 pandemic.
Sudhakar T, Bhansali A, Walkington J, Puelz D. Sudhakar T, et al. Front Epidemiol. 2024 Jun 20;4:1389617. doi: 10.3389/fepid.2024.1389617. eCollection 2024. Front Epidemiol. 2024. PMID: 38966521 Free PMC article.
Quantum-Like Approaches Unveil the Intrinsic Limits of Predictability in Compartmental Models.
Rojas-Venegas JA, Gallarta-Sáenz P, Hurtado RG, Gómez-Gardeñes J, Soriano-Paños D. Rojas-Venegas JA, et al. Entropy (Basel). 2024 Oct 21;26(10):888. doi: 10.3390/e26100888. Entropy (Basel). 2024. PMID: 39451964 Free PMC article.
Assessing bias in susceptible-infected-recovered estimation from aggregated epidemic data.
Shen N, Bourouiba L. Shen N, et al. R Soc Open Sci. 2025 Jul 23;12(7):240526. doi: 10.1098/rsos.240526. eCollection 2025 Jul. R Soc Open Sci. 2025. PMID: 40708661 Free PMC article.
Accurately summarizing an outbreak using epidemiological models takes time.
Case BKM, Young JG, Hébert-Dufresne L. Case BKM, et al. R Soc Open Sci. 2023 Sep 27;10(9):230634. doi: 10.1098/rsos.230634. eCollection 2023 Sep. R Soc Open Sci. 2023. PMID: 37771961 Free PMC article.

See all "Cited by" articles

References

1. Kermack WO, Mckendrick AG. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1927;115(772):700–721. doi: 10.1098/rspa.1927.0118. - DOI
1. Ross L-CSR. An application of the theory of probabilities to the study of a priori pathometry.-Part I. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1916;92(638):204–230. doi: 10.1098/rspa.1916.0007. - DOI
1. Ross L-CSR, Hudson HP. An application of the theory of probabilities to the study of a priori pathometry.-Part III. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1917;93(650):225–240. doi: 10.1098/rspa.1917.0015. - DOI
1. Ross R, Hudson HP. An application of the theory of probabilities to the study of a priori pathometry.-Part II. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1917;93(650):212–225. doi: 10.1098/rspa.1917.0014. - DOI
1. Brauer F, Castillo-Chavez C, Feng Z. Mathematical models in epidemiology. Texts in applied mathematics. New York, NY: Springer; 2019.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Limits of epidemic prediction using SIR models

Affiliations

Limits of epidemic prediction using SIR models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical