Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jan;156(2):384-399.
doi: 10.1053/j.gastro.2018.07.058. Epub 2018 Sep 27.

Insights From Deep Sequencing of the HBV Genome-Unique, Tiny, and Misunderstood

Affiliations
Review

Insights From Deep Sequencing of the HBV Genome-Unique, Tiny, and Misunderstood

Anna L McNaughton et al. Gastroenterology. 2019 Jan.

Abstract

Hepatitis B virus (HBV) is a unique, tiny, partially double-stranded, reverse-transcribing DNA virus with proteins encoded by multiple overlapping reading frames. The substitution rate is surprisingly high for a DNA virus, but lower than that of other reverse transcribing organisms. More than 260 million people worldwide have chronic HBV infection, which causes 0.8 million deaths a year. Because of the high burden of disease, international health agencies have set the goal of eliminating HBV infection by 2030. Nonetheless, the intriguing HBV genome has not been well characterized. We summarize data on the HBV genome structure and replication cycle, explain and quantify diversity within and among infected individuals, and discuss advances that can be offered by application of next-generation sequencing technology. In-depth HBV genome analyses could increase our understanding of disease pathogenesis and allow us to better predict patient outcomes, optimize treatment, and develop new therapeutics.

Keywords: Diversity; Evolution; Genotype; Hepatitis B Virus.

PubMed Disclaimer

Figures

None
Anna L. McNaughton
None
Valentina D’Arienzo
None
M. Azim Ansari
None
Sheila F. Lumley
None
Margaret Littlejohn
None
Peter Revill
None
Jane A. McKeating
None
Philippa C. Matthews
Figure 1
Figure 1
Relationships between HBV and other hepadnaviruses, genotype diversity, and genome size. (A) Phylogenetic tree of the relation among avian, mammalian, and other hepadnaviruses. Hepadnavirus reference sequences for avian (NC_005950.1, NC_001344.1, NC_016561.1, NC_005890.1, NC_001486.1, NC_035210.1, NC_005888.1), mammalian (NC_003977.2, NC_028129.1, NC_024445.1, NC_024444.1, NC_024443.1, NC_020881.1, NC_004107.1, NC_001484.1), and other (NC_027922.1, NC_030446.1, NC_030445.1) species were downloaded from Genbank. This dataset was further supplemented with hepadnavirus isolates from chimpanzees, orangutans, and gorillas (AF193863, FJ798097, FJ798098) and some widely cited HBV genotype strains (X02763, D00330, AY123041, V01460, X75657, X69798, AF160501, AY090454). (B) Midpoint-rooted maximum likelihood phylogenetic tree generated using MEGA7 with bootstrap replicates of 1000 used, indicating relations between HBV genotypes and subtypes and their typical geographic distribution. Widely used reference sequences for genotypes A–D and F are included. For genotypes with a single subtype, the reference sequences were used to generate the tree. The sequences used to generate the tree were genotype A: KP234050.1, HE974376.1, KP234052.1, AY934764.1, KP234053.1, GQ331048.1; genotype B: D50521.1, AB073825.1, GQ924628.1, AB073826.1, AB219427.1, DQ463792.1, AP011091.1, AP011093.1, GQ205440.1, GQ358146.1; genotype C: KM999990.1, KY629637.1, AB554019.1, AB554018.1, AB644281.1, AB644283.1, AB644286.1, AB644287.1, KP017272.1, KU695741.1, KF873519.1, KM999992.1, KM999993.1, AP011107.1, KP017269.1, AP011108.1; genotype D: AB104711.1, HQ700511.1, KP090181.1, FJ692533.2, DQ315780.1, KF170740.1, KP322600.1, FJ904406.1; genotype F: AF223963.1, AY311369.1, AY311370.1, AB166850.1; genotype I: AB562462.1, FJ023671.1; genotype J: AB486012.1; and HBVdb genotype reference sequences for genotypes A–H, respectively: X02763, D00331, AY123041.1, V01460.1, X75657.1, X69798.1, AF160501, AY090454. (C) Relative genome sizes of viruses pathogenic to humans including HBV (3.2 kB; arrow). Genomes were obtained from https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10239 and sorted by nucleotide length. For each virus type, a representative genome was selected. Metadata, including accession numbers for each organism, can be found at 10.6084/m9.figshare.6080402. CMV, cytomegalovirus; dsRNA, double-stranded RNA; EBV, Epstein-Barr virus; gt, genotype; HAV, hepatitis A virus; HCV, hepatitis C virus; HDV, hepatitis D virus; HEV, hepatitis E virus; HHV, human herpesvirus; HPV, human papillomavirus; HSV, herpes simplex virus; HTLV, human T-lymphotropic virus; LCMV, lymphocytic choriomeningitis virus; MERS, Middle East Respiratory Syndrome; SARS, Severe Acute Respiratory Syndrome; ssDNA, single-stranded DNA; ssRNA, single-stranded RNA; VZV, varicella zoster virus.
Figure 2
Figure 2
Annotated HBV genome and replication cycle. (A) The 4 overlapping ORFs and the 7 products encoded. Gene products are indicated by text boxes, with start and end positions derived using X02763.1 as a reference strain. The major functional domains of the P gene product are indicated (dotted lines). Large HBs consist of pre-S1, pre-S2, and S; medium HBs consist of pre-S2 and S; and small HBs consist of S only. The overlap of >1000 nucleotides between the P and S genes is the largest gene overlap of any known animal virus. The near-complete negative DNA strand and partially complete positive DNA strands (dotted line indicates approximated missing region) also are shown, in addition to the position of EcoR1. The 5′ end of the complete negative-sense DNA strand is covalently bound to the viral RT. The complementary positive-sense DNA strand is partially complete, covering approximately two thirds of the viral genome. The 5′ end of the incomplete strand is defined by a short oligo-ribonucleotide region; the 3′ end varies within and among hosts. (B) Replication cycle (adapted from Liang, special issue). (i) Infective HBV virions in serum, often referred to as Dane particles (diameter, 42 nm). The capsid structure has icosahedral symmetry: T = 4 (31 nm; 90% of population) and T = 3 (28 nm; 10% of population)., (ii) The virus enters hepatocytes by HSPG (low-affinity binding) and solute carrier family 10 member 1 (SLC10A1; also called sodium taurocholate co-transporting polypeptide NTCP; high-affinity binding). (iii) The molecular processes of un-coating and nuclear import are unclear but likely require cell proteins. (iv) Viral DNA enters the nucleus as RC-DNA. (v) Viral DNA is reconfigured as cccDNA within the nucleus by the cell’s DNA repair factors; this stable structure occurs in association with host histones that mediate DNA packaging. (vi) The open cccDNA structure is a template for host RNA polymerase II. (vii) DNA is transcribed to pre-genomic RNA intermediates in the nucleus, creating 4 mRNAs (blue): a 3.5-kb transcript encoding precore RNA (full-length pre-genomic RNA also shown in green); 2.4- and 2.1-kb mRNA transcripts for pre-S and S, respectively; and a 0.7-kb mRNA encoding the X protein. The RNA is transported to the cytoplasm, where it is translated to 7 viral proteins (short, medium, and long S proteins, core, e antigen, polymerase, and X protein). (viii) HBV RT produces a negative-strand DNA from pre-genomic RNA. The RNA template is degraded by RNase H, and then synthesis of the positive-strand DNA is initiated. HBV DNA is repackaged in relaxed form with other proteins inside the host cell. (ix) New virions and viral proteins are released into the blood. Excess HBsAg forms small noninfectious, subviral particles (∼20 nm diameter), and long filaments; free HBeAg and capsids also are secreted. C, core; HBeAg, hepatitis B e antigen; HBx, hepatitis B X protein; HSPG, heparan sulfate proteoglycan; NCTP, Na+-taurocholate co-transporting polypeptide pol, polymerase; TP, terminal protein.
Figure 3
Figure 3
HBV diversity. (A) Relation between genome type and substitution rate. Estimates of evolutionary rate (substitutions per nucleotide per year) were taken from Sanjuán and were calculated using Bayesian molecular clock approaches. For the different genome types, median rates of evolution were 9.32 × 10−6 (interquartile range [IQR], 7.00 × 10−7–7.20 × 10−5) for dsDNA, 6.36 × 10−4 (IQR, 1.60 × 10−4–1.88 × 10−3) for dsRNA, 1.10 × 10−3 (IQR, 4.52 × 10−4–2.69 × 10−3) for +ssRNA, 9.17 × 10−4 (IQR, 3.55 × 10−4–3.40 × 10−3) for −ssRNA, and 2.08 × 10−4 (IQR, 1.36 × 10−4–5.65 × 10−4) for ssDNA. (B) Distribution of diversity along the HBV genome. Full-length HBV genome sequences were obtained from HBVdb in August 2017 (n = 5383). Sequences were aligned using MAFFT (https://mafft.cbrc.jp/alignment/server/). Sequences for each genotype were randomly shuffled using a function within SSE 1.3 and 250 sequences of each genotype were randomly selected for analysis to normalize the number of sequences of each genotype analyzed. Only 225 sequences were available for genotype F; genotypes G, H, I, and J were excluded from the analysis because there were insufficient numbers of sequences available for comparison with other genotypes. Within-genotype pairwise nucleotide distances were calculated for genotypes A–F using SSE 1.3 using a window size of 150 bp and increments of 20 bp. The greatest variability (typically >5% sequence divergence) is observed in regions where there are no overlapping ORFs. Entropy at each nucleotide within the dataset was calculated using SSE 1.3. (C) Comparison of Shannon entropy at each site of overlapping and nonoverlapping regions of the HBV genome. Genotypes were analyzed individually and regions of the genome were divided into overlapping and nonoverlapping regions using an annotated genome (https://hbvdb.ibcp.fr/HBVdb/HBVdbGenome). Mean Shannon entropy in overlapping regions is significantly lower at 0.16 (95% confidence interval, 0.14–0.17) than in nonoverlapping regions (0.20; 95% confidence interval, 0.18–0.21; P < .0001 by Mann-Whitney U-test). C, core; dsRNA, double-stranded RNA; HCV, hepatitis C virus; ssDNA, single-stranded DNA; ssRNA, single-stranded RNA.

References

    1. London W.T., Sutnick A.I., Blumberg B.S. Australia antigen and acute viral hepatitis. Ann Intern Med. 1969;70:55–59. - PubMed
    1. Blumberg B.S., Alter H.J., Visnich S. A “new” antigen in leukemia sera. JAMA. 1965;191:541–546. - PubMed
    1. World Health Organization Preventing perinatal hepatitis B virus transmission : a guide for introducing and strengthening hepatitis B birth dose vaccination. http://apps.who.int/iris/bitstream/10665/208278/1/ Available at: Published 2015.
    1. Schweitzer A., Horn J., Mikolajczyk R.T. Estimations of worldwide prevalence of chronic hepatitis B virus infection: a systematic review of data published between 1965 and 2013. Lancet. 2015;386:1546–1555. - PubMed
    1. Matthews P.C., Geretti A.M., Goulder P.J.R. Epidemiology and impact of HIV coinfection with hepatitis B and hepatitis C viruses in Sub-Saharan Africa. J Clin Virol. 2014;61:20–33. - PubMed

Publication types