Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 20;85(4):36.
doi: 10.1007/s00285-022-01804-5.

Limits of epidemic prediction using SIR models

Affiliations

Limits of epidemic prediction using SIR models

Omar Melikechi et al. J Math Biol. .

Abstract

The Susceptible-Infectious-Recovered (SIR) equations and their extensions comprise a commonly utilized set of models for understanding and predicting the course of an epidemic. In practice, it is of substantial interest to estimate the model parameters based on noisy observations early in the outbreak, well before the epidemic reaches its peak. This allows prediction of the subsequent course of the epidemic and design of appropriate interventions. However, accurately inferring SIR model parameters in such scenarios is problematic. This article provides novel, theoretical insight on this issue of practical identifiability of the SIR model. Our theory provides new understanding of the inferential limits of routinely used epidemic models and provides a valuable addition to current simulate-and-check methods. We illustrate some practical implications through application to a real-world epidemic data set.

Keywords: Epidemic prediction; Hypothesis testing; Identifiability; Nonlinear dynamics; Parameter inference; SIR model.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The trajectory of the SIR model used in the simulation (left). Plots of β^ vs γ^ from 1000 realizations from the sampling distribution of their MLE (right)
Fig. 2
Fig. 2
Distance between perturbed and true trajectories for different parameter values and population sizes. In each graph the horizontal axis is the number of days since the start of the epidemic and the vertical axis is the distance φt(x,θϵ(ω))-φt(x,θ) between a perturbed trajectory and the true trajectory at time t. The gray, green, and black curves correspond to 90 perturbed trajectories, one for each of 90 equally spaced angles ω in [0,2π). The black curves correspond to the angles π/4 and 5π/4. The green curves correspond to the remaining angles in the intervals [π/4-π/12,π/4+π/12) and [5π/4-π/12,5π/4+π/12), i.e. in intervals of width π/6 centered at π/4 and 5π/4, respectively. The gray curves correspond to those angles in [0,2π) outside these two intervals. Note the distances corresponding to angles close to π/4 and 5π/4 (the green and black curves) are smaller than those distances corresponding to angles farther away from π/4 and 5π/4 (the gray curves), which supports the claim that the inverse problem is least practically identifiable for parameter perturbations approximately along a line of slope 1. The approximate lower bound of Approximation 1 is in red. The peak time of the trajectory corresponding to θ is indicated by the vertical blue line, and 80% of it by the vertical orange line. The first through fourth columns have population sizes 104,105,106, and 107, respectively, with only one initial infection in each case. The perturbation sizes for the first through fourth rows are ϵ=.03,.03,.06, and .1, respectively. The SIR paramaters for the first through fourth rows are (β,γ)=(.21,.14),(.21,.07),(.42,.07), and (1.68, .14), which give respective R0 values of 1.5, 3, 6, and 12. Note the approximate lower bound holds roughly up to 80% of the peak time in all cases despite the wide range of parameters. Finally, we remark that the two seemingly “distinct" classes of gray curves in each plot correspond to different subsets of the 90 distinct angles. This as well as the multimodality of certain curves (which becomes more apparent when our graphs are extended further beyond the peak time) are consequences of the nonlinearity of the SIR model and are not directly relevant to our analysis
Fig. 3
Fig. 3
Logarithm of distance between perturbed and true trajectories for different parameter values and population sizes. Everything is the same as in Fig. 2 except now we plot logφt(x,θϵ(ω))-φt(x,θ) instead of φt(x,θϵ(ω))-φt(x,θ). This gives a better view of the approximate lower bound early in the epidemic. Note the vertical axis is now a log scale
Fig. 4
Fig. 4
Type II error as a function of perturbation size and noise level. The left panel shows the empirical and theoretical type II errors for the angles ω=0,π/4, and π as a function of perturbation size ϵ with fixed noise level σ=0.3. The right panel shows the empirical and theoretical type II errors for the same angles as a function of noise level σ with fixed perturbation size ϵ=.03. In each case the SIR parameters are those from Sect. 2, namely (β,γ)=(.21,.07), N=107, and initial condition i0=1/N. The time horizon T is 60 days into the epidemic, which in this case is 60 days prior to the peak time. The significance level is α=.05. Here, theoretical refers to the first (red) and second (black) approximations of type II error E2(ω) in Approximation 2, i.e. Eqs. (8) and (9), respectively. Empirical refers to the type II error obtained by performing 1000 simulations of the noisy SIR model (2) followed by a likelihood ratio test of the hypothesis in (7) for each set of parameters. More specifically, the red and black curves lying over the blue line are the type II error approximations (8) and (9) when ω=π/4, those lying over the purple line are when ω=π, and those lying over the green line are when ω=0, with the blue, green, and purple curves corresponding to the empirically computed type II error rates when ω=π/4,0, and π, respectively. In each case both theoretical results closely align with the empirical ones, with the first approximation being slightly better than the second as expected. Also as predicted, the empirical type II errors all approach 1-α=.95 both as perturbation size goes to 0 and as the noise level gets large, and this approach is most rapid when ω=π/4. In each case the noise model is Case 2, σt=Nσit. For the simulated blue, green, and purple curves, we used a numerical integrator to obtain the it values, while for the red and black curves we used the pre-peak approximation iteδti0
Fig. 5
Fig. 5
Consequences of type II error. The top panels show the total number of infections 10 days past the time of peak infection as a percentage of the total population. The bottom panels show the duration of the epidemic, which is defined to be the first day past the peak when less than 10 individuals are infectious. The left panels correspond to the parameter θ0=(0.21,0.14) and the right panels to θ0=(.21,.07). The red lines give the total percent infected or duration of the epidemic for the true parameter θ0 in each of their respective plots, while the blue and green curves give these values for θϵ(π/4) and θϵ(5π/4) over a range of ϵ values, respectively. In all cases N=107
Fig. 6
Fig. 6
Practical identifiability of δ. The left and center panels use the first type II error approximation in Approximation 2 to graph type II error as a function of perturbation size, ϵ, and noise level, σ, respectively. Each of the rainbow colored curves in both panels correspond to one of 150 different values of ω spread uniformly across [0,2π). The color chart in the right panel indicates the colors corresponding to different angles ω: The light red curves correspond to the ω closest to π/4 and 5π/4, the yellow are those a bit farther away, the blue still farther, and the purple are those farthest from π/4 and 5π/4, i.e. closest to 3π/4 and 7π/4. Finally, the dark red curve in each of the two panels corresponds to ω=π/4 and 5π/4, which have the same type II error. Note the rapid fall off in type II error as angles get farther from π/4 and 5π/4, especially as a function of ϵ. This agrees with the empirical observation in Fig. 1 that MLE favors parameters lying along a line of slope 1. In particular, the inverse problem for δ is practically identifiable
Fig. 7
Fig. 7
Going from hypothesis test (12) to (7)
Fig. 8
Fig. 8
New York City public testing results for COVID-19 from the first known case on February 29, 2020 to March 15, 2020. We have used maximum likelihood estimation to generate an SIR trajectory through the noisy data for each reporting rate
Fig. 9
Fig. 9
Type II error rate as a function of ϵ and σ at significance level α=0.1, reporting rate p=0.15, and T=14 days of new infection observations
Fig. 10
Fig. 10
Error analysis corresponding to Figs. 2, 3. As in Figs. 2, 3, the horizontal axis in each graph is the number of days since the start of the epidemic, the vertical blue line is the peak time, and the orange line is 80% of the peak time. The vertical axis is the logarithm of the relative error, log|Etϵ|-logφtϵ-φt, at time t. There are 50 green curves in each plot which are the relative errors corresponding to 50 angles in the intervals [π/4-π/12,π/4+π/12) and [5π/4-π/12,5π/4+π/12). The red line in each plot is the average linear approximation of the 50 green curves. Also as in Figs. 2, 3, the first through fourth columns have population sizes 104,105,106, and 107, respectively with one initial infection in each case, and the first through fourth rows have parameters (β,γ,ϵ)=(.21,.14,.03),(.21,.07,.03),(.42,.07,.06), and (1.68, .14, .1) which give R0 values of 1.5, 3, 6, and 12. The main point is that for every combination of parameters the relative error is exponentially small until about 80% of the peak time at which point it becomes O(1) and subsequently blows up exponentially

Similar articles

Cited by

References

    1. Kermack WO, Mckendrick AG. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1927;115(772):700–721. doi: 10.1098/rspa.1927.0118. - DOI
    1. Ross L-CSR. An application of the theory of probabilities to the study of a priori pathometry.-Part I. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1916;92(638):204–230. doi: 10.1098/rspa.1916.0007. - DOI
    1. Ross L-CSR, Hudson HP. An application of the theory of probabilities to the study of a priori pathometry.-Part III. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1917;93(650):225–240. doi: 10.1098/rspa.1917.0015. - DOI
    1. Ross R, Hudson HP. An application of the theory of probabilities to the study of a priori pathometry.-Part II. Proc. R. Soc. Lond. Ser. A Contain. Pap. Math. Phys. Character. 1917;93(650):212–225. doi: 10.1098/rspa.1917.0014. - DOI
    1. Brauer F, Castillo-Chavez C, Feng Z. Mathematical models in epidemiology. Texts in applied mathematics. New York, NY: Springer; 2019.

Publication types