Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 9;217(2):iyaa039.
doi: 10.1093/genetics/iyaa039.

Inference of population genetic parameters from an irregular time series of seasonal influenza virus sequences

Affiliations

Inference of population genetic parameters from an irregular time series of seasonal influenza virus sequences

Myriam Croze et al. Genetics. .

Abstract

Basic summary statistics that quantify the population genetic structure of influenza virus are important for understanding and inferring the evolutionary and epidemiological processes. However, the sampling dates of global virus sequences in the last several decades are scattered nonuniformly throughout the calendar. Such temporal structure of samples and the small effective size of viral population hampers the use of conventional methods to calculate summary statistics. Here, we define statistics that overcome this problem by correcting for the sampling-time difference in quantifying a pairwise sequence difference. A simple linear regression method jointly estimates the mutation rate and the level of sequence polymorphism, thus providing an estimate of the effective population size. It also leads to the definition of Wright's FST for arbitrary time-series data. Furthermore, as an alternative to Tajima's D statistic or the site-frequency spectrum, a mismatch distribution corrected for sampling-time differences can be obtained and compared between actual and simulated data. Application of these methods to seasonal influenza A/H3N2 viruses sampled between 1980 and 2017 and sequences simulated under the model of recurrent positive selection with metapopulation dynamics allowed us to estimate the synonymous mutation rate and find parameter values for selection and demographic structure that fit the observation. We found that the mutation rates of HA and PB1 segments before 2007 were particularly high and that including recurrent positive selection in our model was essential for the genealogical structure of the HA segment. Methods developed here can be generally applied to population genetic inferences using serially sampled genetic data.

Keywords: influenza virus; mismatch distribution; serial sample; summary statistics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Coalescent tree of two viral sequences that are sampled at times different by τ. Assuming a constant rate µ of neutral (synonymous) mutation along a lineage, the expected neutral sequence difference is given by (2E[T]+τ)μ, where T is time to the coalescence of two contemporaneous sequences. Therefore, the expectation of synonymous sequence difference is greater than the scaled mutation rate, 2E[T]µ = 2 Neµ, and the difference is τµ.
Figure 2
Figure 2
Pairwise nucleotide difference (d) per site of segments HA, NA, PB1, PB2, PA, and NP plotted against sampling time difference (τ, in days) for H3N2 data sequences. Data points are from 27-year data (1980–2006; black dots) and from the 10-year data (2007–2017; gray dots). Regression lines for 27- and 10-year data are shown in red and blue, respectively. The proportions of bootstrap samples in the tests for the statistical difference between 27- and 10-year periods of μ^, π, and Ne^ are shown below each regression plot.
Figure 3
Figure 3
The TCMDs of six influenza virus segments in the 27-year (A) and 10- year (B) H3N2 data sets. To obtain d and make histograms τmax=300 and bin size w =0.002 were used.
Figure 4
Figure 4
The average TCMD (black curve) for simulated data under neutrality (s =0) with m =0.004 and Kmax=110 that produce the best-fitting FST and π values to the observed data (red curve; the TCMD of HA segments in the 10-year data set). TCMDs of individual simulation replicates are shown in gray curves. To obtain d and make histograms τmax=300 and w =0.002 were used.
Figure 5
Figure 5
The average TCMD (black curve) for simulated data under positive selection (s =0.1 and ε = 10) with m =0.00025 and Kmax=6700, which is congruent to the TCMD (red curve) of HA segment in the 10-year H3N2 data (KS test, p =0.058). TCMDs of individual simulation replicates are shown in gray curves. To obtain d and make histograms τmax=300 and w =0.002 were used.

Similar articles

Cited by

References

    1. Allen JD, Ross TM. 2018. H3N2 influenza viruses in humans: viral mechanisms, evolution, and evaluation. Hum Vaccin Immunother. 14:1840–1847. - PMC - PubMed
    1. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, et al. 2008. The influenza virus resource at the National Center for Biotechnology Information. JVI 82:596–601. - PMC - PubMed
    1. Bedford T, Cobey S, Pascual M. 2011. Strength and tempo of selection revealed in viral gene genealogies. BMC Evol Biol. 11:220. - PMC - PubMed
    1. Berry IM, Melendrez MC, Li T, Hawksworth AW, Brice GT, et al. 2016. Frequency of influenza H3N2 intra-subtype reassortment: attributes and implications of reassortant spread. BMC Biol. 14:117. - PMC - PubMed
    1. Bhatt S, Holmes EC, Pybus OG. 2011. The genomic rate of molecular adaptation of the human influenza A virus. Mol Biol Evol. 28:2443–2451. - PMC - PubMed

Publication types

MeSH terms