Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Jun 20;97(13):6974-80.
doi: 10.1073/pnas.97.13.6974.

Effects of passage history and sampling bias on phylogenetic reconstruction of human influenza A evolution

Affiliations

Effects of passage history and sampling bias on phylogenetic reconstruction of human influenza A evolution

R M Bush et al. Proc Natl Acad Sci U S A. .

Abstract

In this paper we determine the extent to which host-mediated mutations and a known sampling bias affect evolutionary studies of human influenza A. Previous phylogenetic reconstruction of influenza A (H3N2) evolution using the hemagglutinin gene revealed an excess of nonsilent substitutions assigned to the terminal branches of the tree. We investigate two hypotheses to explain this observation. The first hypothesis is that the excess reflects mutations that were either not present or were at low frequency in the viral sample isolated from its human host, and that these mutations increased in frequency during passage of the virus in embryonated eggs. A set of 22 codons known to undergo such "host-mediated" mutations showed a significant excess of mutations assigned to branches attaching sequences from egg-cultured (as opposed to cell-cultured) isolates to the tree. Our second hypothesis is that the remaining excess results from sampling bias. Influenza surveillance is purposefully biased toward sequencing antigenically dissimilar strains in an effort to identify new variants that may signal the need to update the vaccine. This bias produces an excess of mutations assigned to terminal branches simply because an isolate with no close relatives is by definition attached to the tree by a relatively long branch. Simulations show that the magnitude of excess mutations we observed in the hemagglutinin tree is consistent with expectations based on our sampling protocol. Sampling bias does not affect inferences about evolution drawn from phylogenetic analyses. However, if possible, the excess caused by host-mediated mutations should be removed from studies of the evolution of influenza viruses as they replicate in their human hosts.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Maximum parsimony tree constructed from 357 HA1 genes of the human influenza virus type A subtype H3.
Figure 2
Figure 2
Partitioning mutations assigned to terminal branches of a phylogenetic tree. The tree on the left represents the evolution of a population of 16 viruses that each differ from their ancestor by one unique mutation. The tree on the right is a reconstruction after (i) sampling only eight of the viruses with a bias against sequencing closely related isolates and (ii) propagating the isolates in embryonated chicken eggs (e) or cell culture (c) in the laboratory. The tree constructed of sampled sequences is shown in black, with the terminal branches as thicker lines. The branch attaching sequence 4 (an egg-cultured isolate) to the tree is one mutation longer than it should be. The additional mutation is HM, that is, a mutation not present or at low frequency in the isolate before laboratory propagation. The branches attaching sequences from isolates 3–8 to the tree are longer than they would have been if our sample had included their nearest relatives. The increased length of branch 4 is in part caused by a process other than the ongoing evolution of the virus during replication in the human host. The remaining excess is simply a reflection of sampling bias, and thus does not affect evolutionary inferences made from the tree.
Figure 3
Figure 3
The effects of sampling bias on phylogenetic reconstruction. The tree on the left shows a hypothetical population of 16 isolates that each differ from their ancestor by one unique mutation. The four trees to the right show the original tree overlaid with the tree that would result from sampling only half of the total population. The tree constructed of sampled sequences is shown in black, with the terminal branches as thicker lines. Clumped sampling causes a decrease in the total genetic variation sampled. The mutations not captured in the sample would have been assigned only to internal branches, as shown by the symbol X. As a result, the proportion of mutations assigned to the internal and terminal branches changes with sampling dispersion, but not at the same rate (shown in the line plot at the bottom). Without knowledge of where a sample lies on such a continuum, there is no way to derive the expected proportion of mutations that should be assigned to the terminal and internal branches of a phylogenetic tree.

References

    1. Robertson J S. Rev Med Virol. 1993;3:97–106.
    1. Sawyer L S W, Wrin M T, Crawford-Miksza L, Potts B, Wu Y, Weber P A, Alfonso R D, Hanson C V. J Virol. 1994;68:1342–1349. - PMC - PubMed
    1. Cao J X, Ni H, Wills M R, Campbell G A, Sil B K, Ryman K D, Kitchen I, Barrett A D. J Gen Virol. 1995;76:2757–2764. - PubMed
    1. Graff J, Normann A, Feinstone S M, Flehmig B. J Virol. 1994;68:548–554. - PMC - PubMed
    1. Itoh M, Isegawa Y, Hotta H, Homma M. J Gen Virol. 1997;78:3207–3215. - PubMed

Publication types

LinkOut - more resources