Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 10;508(7495):254-7.
doi: 10.1038/nature13016. Epub 2014 Feb 16.

A synchronized global sweep of the internal genes of modern avian influenza virus

Affiliations

A synchronized global sweep of the internal genes of modern avian influenza virus

Michael Worobey et al. Nature. .

Abstract

Zoonotic infectious diseases such as influenza continue to pose a grave threat to human health. However, the factors that mediate the emergence of RNA viruses such as influenza A virus (IAV) are still incompletely understood. Phylogenetic inference is crucial to reconstructing the origins and tracing the flow of IAV within and between hosts. Here we show that explicitly allowing IAV host lineages to have independent rates of molecular evolution is necessary for reliable phylogenetic inference of IAV and that methods that do not do so, including 'relaxed' molecular clock models, can be positively misleading. A phylogenomic analysis using a host-specific local clock model recovers extremely consistent evolutionary histories across all genomic segments and demonstrates that the equine H7N7 lineage is a sister clade to strains from birds--as well as those from humans, swine and the equine H3N8 lineage--sharing an ancestor with them in the mid to late 1800s. Moreover, major western and eastern hemisphere avian influenza lineages inferred for each gene coalesce in the late 1800s. On the basis of these phylogenies and the synchrony of these key nodes, we infer that the internal genes of avian influenza virus (AIV) underwent a global selective sweep beginning in the late 1800s, a process that continued throughout the twentieth century and up to the present. The resulting western hemispheric AIV lineage subsequently contributed most of the genomic segments to the 1918 pandemic virus and, independently, the 1963 equine H3N8 panzootic lineage. This approach provides a clear resolution of evolutionary patterns and processes in IAV, including the flow of viral genes and genomes within and between host lineages.

PubMed Disclaimer

Figures

Extended Data Figure 1
Extended Data Figure 1. Performance of different clock models on simulated data
a, Summary of the 100 replicates corresponding to Fig. 1 (IVA-like substitution model). The box plots represent the median, Q1, Q3, minimum, and maximum of the 100 median TMRCA estimates. The HSLC model recovered the ‘correct’ (model) tree topology in 100% of the simulations; the other models did so in 0%. With the relaxed clock the 95% CI for the TMRCA never included the real root node date, while the HSLC model did in 91% of the simulations. b, Summary of 10 otherwise similar replicates, but simulated under a JC69 substitution model. c, Simulation with unequal sampling across clades, with ‘fast’ clade (‘avian’) sequences over-represented. (The model tree was identical to that in Fig. 1a except for the unequal number of sequences from the different clades as shown.) d, Simulation with ‘slow’ clade (‘equine’) sequences over-represented. Unlike the HSLC model, root date estimates are systematically biased under both strict and relaxed clock models and are strongly influenced by the balance of ‘fast-clade’ and ‘slow-clade’ sequences sampled.
Extended Data Figure 2
Extended Data Figure 2. Relaxed molecular clock results
a-h, respectively: MCC trees inferred under a UCLD relaxed molecular clock model. Host-specific rate distributions in substitutions/site/year are inset at top left. Trees are drawn to the same time scale, with branch lengths in years. Eastern (“e”) and Western (“w”) Hemisphere AIV lineages are highlighted with black and gray vertical bars, respectively. Colouring of branches and clades follows the pattern in Fig. 2. The median date of node 1 and node 2 from the HSLC analyses depicted in Fig. 2 are shown here for comparison. As with the synthetic data sets (Fig. 1, Extended Data Fig. 1), the topologies and timing estimated under a relaxed clock model appear to be compromised by a failure to account for host-specific rates. It is not readily apparent from these trees, for example, that the equine H7N7 lineage is basal to the AIV diversity or that the 1918 pandemic virus is nested within a Western Hemisphere AIV lineage. The root node in each tree is also severely biased toward more recent dates, similar to the results with simulated sequences. Data, input, and full MCC tree files are available from http://dx.doi.org/10.5061/dryad.m04j9.
Extended Data Figure 3
Extended Data Figure 3. Branch-site REL analyses to test for episodic diversifying selection
The branches are coloured to depict the proportion of substitutions along each branch that are under purifying selection (with dN/dS < 1: blue), the proportion evolving neutrally (with dN/dS = 1: gray), or under diversifying selection (with dN/dS > 1: red). In every gene, almost every site in every branch evidently evolved under purifying selection. In a few branches, a small proportion of sites show evidence of positive selection (e.g. the branch between AIV and equine H7N7 in NS1/2). However, the proportion is so small that there seems to be no conceivable way that episodic diversifying selection occasioned by host jumps could be driving the overall dating estimates. Even for HA and NA, purifying selection overwhelmingly dominates.
Extended Data Figure 4
Extended Data Figure 4. Uracil content patterns
a-h, U content patterns for PB2, PB1, PA, HA, NP, NA, M1/2, and NS1/2, respectively. The 95% CI of avian U content is shown for each segment with a gray rectangle. U content versus year of sampling is shown by black diamond symbols for human H1N1 and bat H17N10, magenta diamonds for equine H7N7, and solid green circles for equine H3N8. The curves fitted to the H3N8 data are shown. The equine panzootic of 1872-1873 is depicted with a vertical red line. The left dashed line corresponds to node 1 from Fig. 2, the right dashed line, node 2. P values beside the red lines reflect the tests of whether the equine H7N7 age estimates predate 1872; for HA, NA, and NS1/2 the gray rectangle depicts the 95% confidence interval for the ingroup avian data (H7, N7, and NS1/2 A lineage, respectively). Avian H3, N8, and NS1/2 lineage B U content distributions are indicated with separate arrow lines. The estimated origin dates of the equine H7N7 genes based on U content values were: PB2 1548[1533-1574]; PB1 1842[1816-1877]; PA 1819[1795-1842]; H7 1880[1878-1884]; NP 1785[1747-1823]; N7 1387[1373-1413]; M1/2 1801[1724-1879]; NS1/2 1835[1810-1861]).
Extended Data Figure 5
Extended Data Figure 5. Uracil content patterns for human and swine IVA internal genes
a-f, respectively: human PB2, PB1, PA, NP, M1/2, NS1/2. g-l, respectively: swine PB2, PB1, PA, NP, M1/2, NS1/2. After nearly a century of steadily increasing U content in each of these mammalian hosts, these genes still exhibit considerably lower U content than the corresponding equine H7N7 genes.
Extended Data Figure 6
Extended Data Figure 6. HSLC results for H1, N1, H3, and N8
a-d, respectively: MCC trees inferred under the HSLC model and host-specific rate distributions (to the right of each tree). Trees are drawn to the same scale, with branch lengths in years. Eastern and Western Hemisphere AIV lineages are highlighted with black and gray vertical bars, respectively. Fully resolved trees including posterior probabilities for each node and 95% CIs on node dates are depicted in Fig. S1 i through l. These results suggest an avian origin of the H1 HA and N1 NA of the 1918 human pandemic virus, sometime after the human/avian MRCA in ~1893 for HA and the human/avian MRCA in ~1914 for NA. For H1, the available sample of AIV sequences coalesces in ~1952. Hence, the H1 Western and Eastern Hemisphere lineages were established very recently compared to the internal genes (Fig. 2). This means that current sampling can provide no information about the geographic origin of the HA gene of the 1918 virus. Similarly, for N1, a deep Western Hemisphere lineage shares an MRCA with the Eastern Hemisphere lineage in ~1919 (with a subsequent east-to-west dispersal in the early 1960s, indicated by a vertical arrow). Again, these data offer no insights into the geographical origin of the 1918 pandemic virus’s NA gene since the 1918 sequence is not nested within either a Western or Eastern hemisphere AIV clade as with the internal genes. If archival AIV sequences from closer to 1918 could be recovered they might resolve these geographical questions. For H3 and N8 distinct equine lineages are apparent; however, when and where they crossed from the AIV reservoir remains unclear (see the Supplementary Information for additional discussion).
Extended Data Figure 7
Extended Data Figure 7. HA and NA genetic diversity analysis rates and dates (from Fig. 3)
a, Posterior density of substitution rates of HA and NA. b, Posterior density of TMRCA of all HA subtype and all NA subtypes. c, Within-subtype TMRCAs for each HA and NA subtype.
Extended Data Figure 8
Extended Data Figure 8. Phylogenetic evidence of AIV gene flow from domestic to wild birds
a-h, Subtrees highlighting the observation that most of the post-1940s genetic diversity within Eastern Hemisphere AIV (as well as several West-2 and West-3 lineages in the Western Hemisphere) descends from within the clade of 1920s/30s ‘fowl plague’ (HPAI) and 1940s low pathogenicity avian influenza (LPAI) avian influenza viruses from domestic birds. The major Eastern Hemisphere avian clades are collapsed for clarity and depicted as purple triangles. The brown circle depicts the MRCA of the 1920s/30s sequences from domestic birds. The blue circle represents the MRCA of the major Eastern Hemisphere AIV clade and the closest 1920s/30s virus for each gene. The A/chicken/Japan/1925 HPAI strain, which was newly sequenced for this study, is highlighted in red. These results are subtrees taken from an analysis of the Fig. 2 data set, but with the addition of the three newly-sequenced complete genomes (A/chicken/Japan/1925, A/duck/Manitoba/1953, and A/equine/Detroit/3/1964), as well as several additional South American PB1 sequences, and using an SRD06 substitution model (full trees available from http://dx.doi.org/10.5061/dryad.m04j9).
Figure 1
Figure 1. Performance of different clock models on simulated data
a, Model tree used to simulate nucleotide data, with branch lengths depicted in units of time. The host-specific rates of the ‘equine’, ‘human’, and ‘avian’ lineages are shown to the right of the tree. b, MCC tree for simulation replicate #1 under a strict clock model. The 95% credibility interval for each node time is shown with a bar, and the posterior probability of the ingroup node is indicated. c, MCC tree under a relaxed clock model. d, MCC tree under the HSLC model. The posterior density of the clock rate inferred under each model is shown at right. Summaries of the results for all 100 replicates are shown in Extended Data Fig. 1.
Figure 2
Figure 2. Host-specific local clock model results
a-h, respectively: MCC trees and host-specific rate distributions (inset at top left) inferred under the HSLC model. Trees are drawn to the same scale, with branch lengths in years. The epizootic of 1872-73 is indicated with a solid red line. The major Eastern and Western Hemisphere AIV lineages are highlighted with black and gray vertical bars, respectively. The green triangles represent gull/shorebird clades (order Charadriiformes) allowed their own rate separate from other AIV. In some cases (e.g. NS1/2) there is clear evidence of Charadriiformes AIV descending from domestic avian HPAI viruses of the 1920s and 1930s (which are highlighted in red). Fully resolved trees including posterior probabilities for each node and 95% CIs on node dates are depicted in Fig. S1. Data, input, and full MCC tree files are available from http://dx.doi.org/10.5061/dryad.m04j9.
Figure 3
Figure 3. HA, NA, and internal gene diversity
Summarized time-calibrated phylogenetic trees of known IVA viruses for each genomic segment. Each triangle represents global AIV diversity for internal genes (in grey) and each subtype of HA and NA (in colour). Grey bars represent 95% CIs for the dates of divergence of nodes of interest. The MRCA of each HA and NA subtype and the global avian diversity of every internal gene corresponds to or post-dates 1872 (dashed line). Dates of divergence are shown in Extended Data Fig. 7c.

Comment in

References

    1. Morens DM, Folkers GK, Fauci AS. The challenge of emerging and re-emerging infectious diseases. Nature. 2004;430:242–249. - PMC - PubMed
    1. Parrish CR, et al. Cross-species virus transmission and the emergence of new epidemic diseases. Microbiol. Mol. Biol. Rev. 2008;72:457–470. - PMC - PubMed
    1. Holmes EC. The Evolution and Emergence of RNA Viruses. Oxford University Press; New York: 2009.
    1. Webster RG, Bean WJ, Gorman OT, Chambers TM, Kawaoka Y. Evolution and ecology of influenza A viruses. Microbiol. Rev. 1992;56:152–179. - PMC - PubMed
    1. Fitch WM, Bush RM, Bender CA, Cox NJ. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl Acad. Sci. USA. 1997;94:7712–7718. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources