Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun;22(227):20240632.
doi: 10.1098/rsif.2024.0632. Epub 2025 Jun 4.

Estimating epidemic dynamics with genomic and time series data

Affiliations

Estimating epidemic dynamics with genomic and time series data

Alexander E Zarebski et al. J R Soc Interface. 2025 Jun.

Abstract

Accurately estimating the prevalence and transmissibility of an infectious disease is an important task in genetic infectious disease epidemiology. However, generating accurate estimates of these quantities, that make use of both epidemic time series and pathogen genome sequence data, is a challenging problem. Phylogenetic birth-death processes are a popular choice for modelling the transmission of infectious diseases, but it is difficult to estimate the prevalence of infection with them. Here, we extended our approximate likelihood approach, which combines phylogenetic information from sampled pathogen genomes and epidemiological information from a time series of case counts, to estimate historical prevalence in addition to the effective reproduction number. We implement this new method in a BEAST2 package called Timtam. In a simulation study our approximation is seen to be well-calibrated and recovers the parameters of simulated data. To demonstrate how Timtam can be applied to real datasets, we carried out empirical analyses of data from two infectious disease outbreaks: the outbreak of SARS-CoV-2 onboard the Diamond Princess cruise ship in early 2020 and poliomyelitis in Tajikistan in 2010. In both cases we recover estimates consistent with previous analyses.

Keywords: birth-death processes; computational statistics; genetic epidemiology; phylodynamics.

PubMed Disclaimer

Conflict of interest statement

We declare we have no competing interests.

Figures

The transmission process is viewed as a sequence of events with the observations processed sequentially to approximate their joint likelihood.
Figure 1.
The transmission process is viewed as a sequence of events with the observations processed sequentially to approximate their joint likelihood. (A) Transmission tree with intervals of time an individual was infected indicated by horizontal lines and the vertical grey arrows indicating transmission. Three scheduled unsequenced samples are taken at the times indicated by the vertical dashed lines. (B) Corresponding reconstructed tree and time series of confirmed cases in each of the scheduled unsequenced samples. In the third sample no cases were observed. (C) Prevalence of infection (grey line) and the lineages through time (LTT) plot (black dashed line).
The effective reproduction number through time and its approximation.
Figure 2.
The effective reproduction number through time and its approximation. The approximation smooths out the saw-tooth value of the (recursively computed) effective reproduction number, which occurs when there are scheduled samples. The parameters used for this figure are birth rate of 0.4, death rate of 0.1, sampling rate of 0.02 and a scheduled unsequenced sampling probability of 0.08 (at varying intervals). The solid lines indicate the values obtained with our approximation and the dashed lines indicate the true values accounting for scheduled sampling.
Parameter estimates converge to true values as the data set gets larger.
Figure 3.
Parameter estimates converge to true values as the dataset gets larger. The solid black lines display the HPD intervals, and points indicate the point estimates; the point is filled if the HPD interval contains the true value and empty if it does not. The green points and the green dashed lines indicate the true values of the final prevalence and the reproduction number in the boom and bust portions of the simulation. We ordered the replicates by the final prevalence in each simulation. (A)The estimates when both sequenced and unsequenced data are treated as a point process. (B) The estimates when the unsequenced observations were aggregated into a time series of daily case counts.
SARS-CoV−2 aboard the Diamond Princess cruise ship: A. Sequences were collected across three days and testing varied throughout the quarantine period.
Figure 4.
SARS-CoV-2 aboard the Diamond Princess cruise ship. (A) Sequences were collected across three days and testing varied throughout the quarantine period. The stacked bar chart shows the daily number of confirmed cases and sequenced samples. We indicate the timing of changes to surveillance and quarantine with lines at the top of the figure. (B) Estimates of the prevalence of infection and the 95% HPD intervals onboard the Diamond Princess. In addition to our estimates (green) estimates from Andréoletti et al. [9] are shown (purple). (C) Estimates of the reproduction number and the 95% HPD intervals. In addition to our estimates (shown in green) estimates from Andréoletti et al. [9] (purple) and Vaughan et al. [18] (orange) are shown.
Poliomyelitis in Tajikistan in 2010: (A) Sequences were collected throughout the outbreak.
Figure 5.
Poliomyelitis in Tajikistan in 2010. (A) Sequences were collected throughout the outbreak. The stacked bar chart shows the weekly number of confirmed cases and sequenced samples. We indicate the hypothesized origin time and the timing of vaccination rounds at the top of the figure. (B) Estimates of the prevalence of infection (on a logarithmic scale) and the 95% HPD intervals at 21 day intervals across the outbreak. (C) Estimates of the reproduction number and the 95% HPD intervals as constants before and after the start of vaccination.
Using a subset of the time series data produces similar, though more uncertain, estimates of key epidemiological parameters.
Figure 6.
Using a subset of the time series data produces similar, though more uncertain, estimates of key epidemiological parameters. (A) The case counts (distributed across the days of the week) were randomly subsampled to keep approximately 66% and 33% of the cases. (B) The components of the piece-wise constant estimate of the reproduction number using the subsampled time series are similar to those obtained with the original time series but with a slight trend towards smaller values. The subsampled data estimates have wider HPD intervals. (C) The estimates of the prevalence through time are similar, however the estimates are smaller using the subsampled data. The black vertical lines show the change times of the reproduction number. Plots showing the estimates of the surveillance parameters are shown in electronic supplementary material, figure S9.

References

    1. Kendall DG. 1948. On the generalized ‘birth-and-death’ process. Ann. Math. Statist. 19, 1–15. ( 10.1214/aoms/1177730285) - DOI
    1. Nee S, May RM, Harvey PH. 1994. The reconstructed evolutionary process. Proc. R. Soc. Lond. B 344, 305–311. ( 10.1098/rstb.1994.0068) - DOI - PubMed
    1. Stadler T. 2010. Sampling-through-time in birth–death trees. J. Theor. Biol. 267, 396–404. ( 10.1016/j.jtbi.2010.09.010) - DOI - PubMed
    1. Stadler T, et al. 2012. Estimating the basic reproductive number from viral sequence data. Mol. Biol. Evol. 29, 347–357. ( 10.1093/molbev/msr217) - DOI - PubMed
    1. De Angelis D, Presanis AM, Birrell PJ, Tomba GS, House T. 2015. Four key challenges in infectious disease modelling using data from multiple sources. Epidemics 10, 83–87. ( 10.1016/j.epidem.2014.09.004) - DOI - PMC - PubMed