Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan;1(1):vev006.
doi: 10.1093/ve/vev006. Epub 2015 Jan 1.

Time dependence of evolutionary metrics during the 2009 pandemic influenza virus outbreak

Affiliations

Time dependence of evolutionary metrics during the 2009 pandemic influenza virus outbreak

Austin G Meyer et al. Virus Evol. 2015 Jan.

Abstract

With the expansion of DNA sequencing technology, quantifying evolution in emerging viral outbreaks has become an important tool for scientists and public health officials. Although it is known that the degree of sequence divergence significantly affects the calculation of evolutionary metrics in viral outbreaks, the extent and duration of this effect during an actual outbreak remains unclear. We have analyzed how limited divergence time during an early viral outbreak affects the accuracy of molecular evolutionary metrics. Using sequence data from the first 25 months of the 2009 pandemic H1N1 (pH1N1) outbreak, we calculated each of three different standard evolutionary metrics-molecular clock rate (i.e., evolutionary rate), whole gene dN/dS, and site-wise dN/dS-for hemagglutinin and neuraminidase, using increasingly longer time windows, from 1 month to 25 months. For the molecular clock rate, we found that at least three to four months of temporal divergence from the start of sampling was required to make precise estimates that also agreed with long-term values. For whole gene dN/dS, we found that at least two months of data were required to generate precise estimates, but six to nine months were required for estimates to approach their long term values. For site-wise dN/dS estimates, we found that at least six months of sampling divergence was required before the majority of sites had at least one mutation and were thus evolutionarily informative. Furthermore, eight months of sampling divergence was required before the site-wise estimates appropriately reflected the distribution of values expected from known protein-structure-based evolutionary pressure in influenza. In summary, we found that evolutionary metrics calculated from gene sequence data in early outbreaks should be expected to deviate from their long-term estimates for at least several months after the initial emergence and sequencing of the virus.

Keywords: dN/dS; emerging infectious diseases; evolution; influenza; molecular clock rate.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Molecular clock rate computed by BEAST for pH1 hemagglutinin and pN1 neuraminidase from the pH1N1 outbreak. In panel A, we show the molecular clock rate over time for pH1, and in panel B, we show the molecular clock rate over time for pN1. The error bars represent the HPD 95 per cent of the mean, as reported by BEAST. The plot shows a fourfold decline in the substitution rate estimates from single month of data to 25 months of aggregated data. Further, the molecular clock HPD 95 per cent for the first 2 months of data, for both pH1 and pN1, does not overlap the final clock rate, indicating that these early estimates are in no way representative of the long-term estimates
Figure 2.
Figure 2.
Whole-gene dN/dS estimates for pH1 hemagglutinin and pN1 neuraminidase. Each point represents the average dN/dS at the specified time point for the pH1 (red points) and pN1 (blue points) genes. All of the dN/dS values were calculated by maximum likelihood using HyPhy (Kosakovsky Pond, Frost, and Muse 2005). For each gene, the first month yielded unpredictable, inaccurate dN/dS estimates, but estimates from the data were systematically elevated between months 2–8 for pH1 and 2–11 for pN1. After approximately 11 months for pH1 and 8 months for pN1, the mean dN/dS value largely converged to the long term estimate obtained after 25 months
Figure 3.
Figure 3.
Distribution of site-wise dN/dS for pH1 hemagglutinin and pN1 neuraminidase. dN/dS distributions containing aggregated data from the 1st, 6th, and 25th month are shown for pH1 in panel A and for pN1 in panel B. The first month of data featured a majority of sites with dN/dS=1. Most of these sites were uninformative sites that had not experienced any mutations. The maximum likelihood inference approach sets dN/dS to the arbitrary value of 1 for these sites. After 6 months, roughly half of the sites were informative, although half still showed dN/dS=1. Finally, after 25 months of divergence, the majority of sites had informative dN/dS values. Distributions for all months are shown in Supplementary Figures S3 and S4
Figure 4.
Figure 4.
Fraction of alignment columns with distinct numbers of codons, plotted over time for hemagglutinin. Alignment columns with 1 distinct codon are completely conserved, while columns with two, three, etc. distinct codons have experienced at least one, two, etc. mutations. At 6 months, approximately half of all sites had not yet experienced a mutation, and even after 25 months, 16 out of 503 sites in pH1 remained completely conserved
Figure 5.
Figure 5.
Temporal development of geometric evolutionary constraints in hemagglutinin. The violin plots show the distribution of dN/dS–proximity correlations using each possible site in the hemagglutinin protein as the reference point. The violin plot should be viewed as a horizontal histogram; thus, the wider the violin plot, the higher the number of reference sites with that correlation. Underneath the violin plots, we map these correlations onto the protein structure at 4-month time intervals. The hemagglutinin protein (PDB ID 1RD8) is shown in its native trimer structure, but the correlations are plotted onto just one of the monomers. The correlation pattern stabilizes after approximately 8 months of divergence
Figure 6.
Figure 6.
Temporal development of geometric evolutionary constraints in neuraminidase. The violin plots show the distribution of dN/dS–proximity correlations using each possible site in the neuraminidase protein as the reference point. The violin plot should be viewed as a horizontal histogram; thus, the wider the violin plot, the higher the number of reference sites with that correlation. Underneath the violin plots, we map these correlations onto the protein structure at 4-month time intervals. The neuraminidase protein (PDB ID 3TI3) is shown in its native tetramer structure, but the correlations are plotted onto just one of the monomers. The correlation pattern stabilizes after approximately 8 months of divergence

Similar articles

Cited by

References

    1. Bedford T., et al. (2014) ‘Integrating Influenza Antigenic Dynamics with Molecular Evolution,’ eLife 3: e01914. - PMC - PubMed
    1. Berman H. M., et al. (2000) ‘The Protein Data Bank,’ Nucleic Acids Research, 28: 235–42. - PMC - PubMed
    1. Bhatt S., et al. (2013) ‘The Evolutionary Dynamics and Influenza A Virus Adaptation to Mammalian Host,’ Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 368: 20120382. - PMC - PubMed
    1. Biek R., et al. (2015) ‘Measurably Evolving Pathogens in the Genomic Era,’ Trends in Ecology and Evolution, 30: 306–13. - PMC - PubMed
    1. Bush R. M., et al. (1999a) ‘Predicting the Evolution of Human Influenza A,’ Science, 286: 1921–5. - PubMed

LinkOut - more resources