Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 13;22(3):387-399.e6.
doi: 10.1016/j.chom.2017.07.019. Epub 2017 Aug 31.

Deciphering the Origin and Evolution of Hepatitis B Viruses by Means of a Family of Non-enveloped Fish Viruses

Affiliations

Deciphering the Origin and Evolution of Hepatitis B Viruses by Means of a Family of Non-enveloped Fish Viruses

Chris Lauber et al. Cell Host Microbe. .

Abstract

Hepatitis B viruses (HBVs), which are enveloped viruses with reverse-transcribed DNA genomes, constitute the family Hepadnaviridae. An outstanding feature of HBVs is their streamlined genome organization with extensive gene overlap. Remarkably, the ∼1,100 bp open reading frame (ORF) encoding the envelope proteins is fully nested within the ORF of the viral replicase P. Here, we report the discovery of a diversified family of fish viruses, designated nackednaviruses, which lack the envelope protein gene, but otherwise exhibit key characteristics of HBVs including genome replication via protein-primed reverse-transcription and utilization of structurally related capsids. Phylogenetic reconstruction indicates that these two virus families separated more than 400 million years ago before the rise of tetrapods. We show that HBVs are of ancient origin, descending from non-enveloped progenitors in fishes. Their envelope protein gene emerged de novo, leading to a major transition in viral lifestyle, followed by co-evolution with their hosts over geologic eras.

Keywords: hepadnaviruses; hepatitis B virus; overlapping open reading frames; viral gene evolution; virus discovery; virus origins; virus-host long-term co-evolution.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
Genome Organization of Hepadna- and Nackednaviruses (A) Human hepatitis B virus (HBV). (B) Tetra metahepadnavirus (TMDV) of the Mexican tetra (Astyanax mexicanus). An ORF X is absent. (C) Rockfish nackednavirus (RNDV). (D) Sockeye salmon nackednavirus (SSNDV). (E) Comparison of the RNDV and HBV P ORF. All three reading frames are depicted (+1, +2, +3). White vertical bars: stop codons. TP, terminal protein; RT, reverse transcriptase; RH, RNaseH. (F) Amino acid sequence alignments of selected parts of P (+1) and S (+2) reading frames, including four representatives of nackednaviruses (N) and five of hepadnaviruses (H). Nackednaviruses harbor multiple stop codons in the region of the (+2) frame corresponding to the hepadnaviral RT/S overlap. See also Table S1 and Figures S1 and S2.
Figure 2
Figure 2
Virological Assays (A) Release of naked capsids from RNDV-transfected cells. HuH-7 cells were transfected with expression plasmids containing terminally redundant genomes of RNDV, HBV, or HBV env(−), an envelope protein-deficient HBV mutant. Cell culture supernatants were subjected to CsCl density gradient centrifugation followed by detection of viral DNA in gradient fractions by DNA-dot blot hybridization. Similar results were obtained in the HEK cell line HEK293T and in the rainbow trout gonad cell line RTG-2 (data not shown). (B) P priming assays. RNDV and duck hepatitis B virus (DHBV) wild-type (WT) P proteins produced in a coupled in vitro transcription/translation system were incubated with [α-32P]dGTP and subjected to SDS-PAGE followed by autoradiography (lanes 1 and 6). To demonstrate template dependency of the priming reaction, RNase A digests were performed prior to incubation with [α-32P]dGTP (lanes 2 and 5). An RNDV YMDD-motif mutant in P was included to show dependency of the priming reaction on an intact RT domain (YMHD; lane 8). As control for proper protein production, P proteins were metabolically radiolabeled with [35S]methionine without addition of [α-32P]dGTP (lanes 3, 4, and 7).
Figure 3
Figure 3
Capsid Ultrastructure (A) Alignment of the C proteins of African cichlid nackednavirus (ACNDV) and HBV. α helices of HBV C indicated in the bottom refer to the crystal structure (Wynne et al., 1999). Secondary structures of ACNDV C predicted with jpred (Drozdetskiy et al., 2015) and psipred (ppred) (Jones, 1999) are given in the top. Blue, α helices; yellow, β sheets; red, additional, N-terminal α helix (α+). (B) Comparison of the capsid structure of HBV (T=4) (Yu et al., 2013) and ACNDV (T=3). Cryoelectron microscopy maps low-pass filtered at 12 Å (top row) and 8 Å (middle row). Bottom row: zoomed view onto a local (pseudo-)3-fold axis. Additional α+ helices in ACNDV highlighted by red arcs. See also Figure S3.
Figure 4
Figure 4
Phylogenetic Relationship of Hepadna- and Nackednaviruses Rooted Bayesian phylogenetic tree based on protein sequence alignments of conserved regions in the TP, RT, and RH domains of P (437 amino acid positions). For details on parameter optimization of the Bayesian phylogenetic model, see the STAR Methods. Viruses discovered in this study are in color; lineages with piscine hosts in blue. A fourth member of the metahepadnavirus clade was described in a study by Dill et al. (2016). Scale bar, millions of years. Numbers at branching points: posterior probability support values. Red arrows: most parsimonious periods of major evolutionary innovations. The X ORF is an evolutionary novelty of orthohepadnaviruses. See also Figures S4, S5, and S7 and Tables S2 and S3.
Figure 5
Figure 5
Tanglegram Juxtaposing the Host and Virus Phylogenies Left panel: ultrametric phylogenetic tree of the host species. Right panel: ultrametric phylogenetic tree of the virus species. Middle panel: virus-host associations. To increase the virus-host spectrum, we included endogenous hepadnaviruses from crocodilians (eCrHBV-1) (Suh et al., 2014), snakes (eSnHBV-1) (Gilbert et al., 2014, Suh et al., 2014), and spiny lizards (eSLHBV). Abbreviations for geographic regions: Aa, Antarctica; Af, Africa; As, Asia; Eu, Europe; Na, North America; Nh, Northern Hemisphere; Nt, Neotropics. Solid lines in the virus tree indicate probable separation of viral daughter lineages due to a virus-host cospeciation event; dashed lines indicate probable virus duplication, i.e., virus speciation predating separation of the extant host lineages; and dashed lines with arrow indicate a host switch and its direction, i.e., virus speciation postdating separation of the extant host lineages. Nodes marked with open circles and labeled with Arabic numerals represent putative cospeciation events that were used in our time-calibration analysis (Figures 6 and S6B). The three putative cospeciation events on the side of nackednaviruses are labeled with Roman numerals (N I—N III).
Figure 6
Figure 6
Correlation of Mean Divergence Times between Hepadnaviruses and Their Hosts For the nodes in the hepadnaviral phylogeny representing putative virus-host cospeciation events (Figures 5 and S6A), the mean virus divergence times obtained with the calibrations based on eAHBV-FRY (blue) and the 11 independent calibrations based on the branching of exogenous hepadnaviruses in addition (black), were plotted against the mean host divergence times as retrieved from the literature (The Timetree of Life, 2016, Betancur et al., 2013, Bininda-Emonds et al., 2007, Hedges et al., 2015, Wang et al., 2013) (raw data in Table S2). Vertical and horizontal bars: SD. The nodes (N1–N8) are numbered as in Figures 5 and S6B. The linear regression of the eAHBV-FRY-calibrated nodes indicates a tight congruence between the related virus and host speciation times (blue line; 95% confidence interval: light blue background). Of note, the mean age estimate for the node of eAHBV-FRY resulting from the control calibrations (N5; 67.9 ± 13.6 mya SD) was consistent with the onset of the diversification of Neoaves (69–67 mya), implying that the long-term substitution rate of P proteins does not significantly differ between exogenous and endogenous hepadnaviruses. Both linear regressions had a significant deviation of the slope from 0, a non-significant deviation of the slope from 1, and the differences between the related virus and host divergence times (ΔTV/H) did not significantly differ from 0 (box with gray background). Red squares: the eAHBV-FRY-based node age estimates for the major viral nodes with disparate or ambiguous virus-host topology (Figures 5 and S6A) were plotted against the divergence times of the corresponding present-day hosts. The significant deviation of these nodes from the linear correlation of such nodes with congruent virus-host topology indicates a host switch to have occurred. The red lines and question marks indicate the expected age of the putative initial host reservoir, if these viruses also originated from a virus-host cospeciation event before they switched into a new host. For example parahepadnaviruses, i.e., WSHBV and CSKV, split off from all other hepadnaviruses 359 mya, i.e., at about the same time, when amphibians and amniotes diverged (352 mya according to http://timetree.org/). Likewise, TBHBV, so far the only known hepadnavirus from a South American bat (Drexler et al., 2013), separated from the other orthohepadnaviruses 158 mya, i.e., at about the same time, when placental and marsupial mammals diverged (159 mya according to http://timetree.org/). These observations might at least give a clue where to search for similar viruses.
Figure 7
Figure 7
De Novo Emergence of PreS/S in Hepadnaviruses (A) Weighted substitution rates at codon position 3 in the P frame, which equals codon position 2 in the S frame (P3/S2) for conserved regions in TP, RT, and RH. RT/S overlap region (OV) in hepadnaviruses highlighted by light-yellow background. NOV, non-overlapping regions. (B) Adenine frequencies at P3/S2 positions. (C) Hypothetical ancestral sequences for the overlap region reconstructed for the ancestors of hepadnaviruses (H), nackednaviruses (N), and hepadna- and nackednaviruses (H + N). Predicted stop codons in the S frame are highlighted. (D) Phylogeny for the RT/S overlap region translated in the P frame. Scale bar, substitutions per site. Branches representing the relevant time window after the split between nackedna- and hepadnaviruses and before the first intragroup speciation events are colored and their lengths are indicated. (E) Phylogeny for the RT/S overlap region translated in the S frame. (F) Ratio of analogous branches in the P and S frame-based phylogenies for the RT/S overlap, estimating the relative evolutionary change in the S frame between viral lineages.

References

    1. Abascal F., Zardoya R., Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005;21:2104–2105. - PubMed
    1. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Andrews, S. (2010). FastQC. A quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
    1. Baele G., Lemey P., Bedford T., Rambaut A., Suchard M.A., Alekseyenko A.V. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol. Biol. Evol. 2012;29:2157–2167. - PMC - PubMed
    1. Baldo L., Santos M.E., Salzburger W. Comparative transcriptomics of Eastern African cichlid fishes shows signs of positive selection and a large contribution of untranslated regions to genetic diversity. Genome Biol. Evol. 2011;3:443–455. - PMC - PubMed

MeSH terms

Substances