Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 8;17(9):e1009334.
doi: 10.1371/journal.pcbi.1009334. eCollection 2021 Sep.

An integrated framework for building trustworthy data-driven epidemiological models: Application to the COVID-19 outbreak in New York City

Affiliations

An integrated framework for building trustworthy data-driven epidemiological models: Application to the COVID-19 outbreak in New York City

Sheng Zhang et al. PLoS Comput Biol. .

Abstract

Epidemiological models can provide the dynamic evolution of a pandemic but they are based on many assumptions and parameters that have to be adjusted over the time the pandemic lasts. However, often the available data are not sufficient to identify the model parameters and hence infer the unobserved dynamics. Here, we develop a general framework for building a trustworthy data-driven epidemiological model, consisting of a workflow that integrates data acquisition and event timeline, model development, identifiability analysis, sensitivity analysis, model calibration, model robustness analysis, and projection with uncertainties in different scenarios. In particular, we apply this framework to propose a modified susceptible-exposed-infectious-recovered (SEIR) model, including new compartments and model vaccination in order to project the transmission dynamics of COVID-19 in New York City (NYC). We find that we can uniquely estimate the model parameters and accurately project the daily new infection cases, hospitalizations, and deaths, in agreement with the available data from NYC's government's website. In addition, we employ the calibrated data-driven model to study the effects of vaccination and timing of reopening indoor dining in NYC.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. A general framework for building a trustworthy data-driven epidemiological model—An overview of the main contribution.
In this work, we propose a general framework for building a trustworthy data-driven epidemiological model, which constructs a workflow to integrate data acquisition and event timeline, model development, identifiability analysis, sensitivity analysis, model calibration, model robustness analysis, and projection with uncertainties and scenarios. We first introduce a modified SEIR model that accommodates the pandemic data in New York City. Secondly, we study the structural identifiability, practical identifiability, and sensitivity to examine the relationship between the model’s data and parameters. We then calibrate the identifiable model parameters using simulated annealing and MCMC simulation. Model robustness is then checked to study how the model behaves under random perturbations. In addition, we demonstrate the model’s projective capabilities with uncertainties. Finally, reopening scenarios are investigated as a reference for policymakers.
Fig 2
Fig 2. COVID-19 epidemic in New York City: Data and event timeline.
(a) Daily confirmed cases (February 29, 2020–February 4, 2021). A person is classified as a confirmed COVID-19 case when they test positive in a molecular test (PCR). We split the data into seven time periods based on interventions implemented. The starting times of interventions are shown on the top of each subfigure. (b) Daily hospitalized population (February 29, 2020–February 4, 2021). (c) Daily deceased population. (February 29, 2020–February 4, 2021). A deceased individual is classified as a disease-related death if they had a positive PCR test for the virus within the last 60 days. (d) Daily vaccinated population. (December 14, 2020–February 4, 2021).
Fig 3
Fig 3. Transition diagram between epidemiological classes.
We modify the classic SEIR model to include presymptomatic (P), asymptomatic (A), hospitalized (H), isolated (Q), and deceased (D) individuals. The given data are the inflows of symptomatic (I), hospitalized (H), and deceased (D) individuals. The parameters to estimate are (β, p, q). See Table 1 for the notations and the initial values. See Table 2 for the parameters. See Eq (1) for the corresponding ODE system.
Fig 4
Fig 4. The procedure of choosing parameters to fit.
(a) The procedure of determining parameters to fit. We fix dE, dP, dI, dA, dH, dQ because they are biologically determined, and then fix ϵ, δ due to the result from the correlation matrix analysis. (b) The correlation matrix of five parameters. Each colored off-diagonal cell represents the correlation between two parameters. Green means (almost) not statistically correlated while yellow/purple represents positively/negatively correlated, respectively.
Fig 5
Fig 5. Sensitivity of each quantity of interest (Isum, Hsum,Dsum) with respect to each parameter (β, p, q).
The parameter β is the most important parameter for all three quantities of interest in every stage of the pandemic. The parameter p has no influence on Isum. The parameter q has no influence on Isum or Hsum.
Fig 6
Fig 6. Estimation of daily cases, hospitalizations, deaths, and vaccinations in New York City.
(a) Estimation of daily cases. (b) Estimation of daily hospitalizations. (c) Estimation of daily deaths. (d) We calculate the number of effective vaccinations as a weighted sum of the number of first and second doses administered as shown in Fig 2; we approximate the daily number of effective vaccinations linearly and assume it grows linearly until it reaches the maximum capacity of 20,000 per day.
Fig 7
Fig 7. Estimation of the unobserved dynamics in all the model compartments (S,E, P, I, A, H, Q, D, R).
The number of susceptible individuals (S) drops significantly as the number of cases hikes after December 2020.
Fig 8
Fig 8. Estimation of parameters and reproduction numbers.
(a) Estimated time-dependent transmission rate β(t). (b) Estimated time-dependent hospitalization ratio p(t), compared with daily hospitalizations over daily cases calculated from the raw data. (c) Estimated time-dependent death from hospital ratio q(t), compared with daily deaths over daily hospitalizations calculated from the raw data. (d) Estimated control reproduction number Rc and effective reproduction number Re calculated by the estimated parameters, compared with 1/2 of the logarithm of daily cases.
Fig 9
Fig 9. Average Relative Error (ARE) of (β, p, q) in different observable settings.
Each row corresponds to a standard deviation level of random noise multiplied to the observables. Each column represents an observable setting. When (Isum, Hsum, Dsum) or (Hsum, Dsum) are given, ARE is lower than the threshold 1. Therefore, our model is robust to noise in the NYC dataset. In the rest of of the missing observable cases, our model would not be robust to perturbations, which is consistent with the structural identifiability result.
Fig 10
Fig 10. Projection of daily cases, hospitalizations, and deaths in New York City with uncertainties and scenarios.
Reopening scenarios on February 14 and March 14 are considered. An increase in infectious, hospitalized, and deceased population is expected if the restaurants are reopened in the same way as Stage 5 (September 30, 2020 to December 14, 2020). Postponing the reopening of restaurants from February 14 to March 14 may reduce the number of infectious, hospitalized, and deceased individuals. The actual situation might vary depending on the details and implementations of the actual indoor dining policies that take place in 2021. Remarks: The projections were made and the paper was submitted in February. When updating the paper in June, we overlaid the new data of daily cases, hospitalizations, and deaths from February to June as the testing data. Indoor dining was actually reopened on February 14.

References

    1. Zheng W. Total Variation Regularization for Compartmental Epidemic Models with Time-varying Dynamics. arXiv preprint arXiv:200400412. 2020; p. 1–11.
    1. Roda WC, Varughese MB, Han D, Li MY. Why is it difficult to accurately predict the COVID-19 epidemic? Infectious Disease Modelling. 2020; 5:271–281. doi: 10.1016/j.idm.2020.03.001 - DOI - PMC - PubMed
    1. Lourenco J, Paton R, Ghafari M, Kraemer M, Thompson C, Simmonds P, et al.. Fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the SARS-CoV-2 epidemic. MedRxiv. 2020; p. 1–7.
    1. Maier BF, Brockmann D. Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China. Science. 2020; 368(6492):742–746. doi: 10.1126/science.abb4557 - DOI - PMC - PubMed
    1. Giordano G, Blanchini F, Bruno R, Colaneri P, Di Filippo A, Di Matteo A, et al.. Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy. Nature Medicine. 2020; p. 1–6. doi: 10.1038/s41591-020-0883-7 - DOI - PMC - PubMed

Publication types