Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar;101(3):271-283.
doi: 10.1099/jgv.0.001387.

Analysis of genomic-length HBV sequences to determine genotype and subgenotype reference sequences

Affiliations

Analysis of genomic-length HBV sequences to determine genotype and subgenotype reference sequences

Anna L McNaughton et al. J Gen Virol. 2020 Mar.

Abstract

Hepatitis B virus (HBV) is a diverse, partially double-stranded DNA virus, with 9 genotypes (A-I), and a putative 10th genotype (J), characterized thus far. Given the broadening interest in HBV sequencing, there is an increasing requirement for a consistent, unified approach to HBV genotype and subgenotype classification. We set out to generate an updated resource of reference sequences using the diversity of all genomic-length HBV sequences available in public databases. We collated and aligned genomic-length HBV sequences from public databases and used maximum-likelihood phylogenetic analysis to identify genotype clusters. Within each genotype, we examined the phylogenetic support for currently defined subgenotypes, as well as identifying well-supported clades and deriving reference sequences for them. Based on the phylogenies generated, we present a comprehensive set of HBV reference sequences at the genotype and subgenotype level. All of the generated data, including the alignments, phylogenies and chosen reference sequences, are available online (https://doi.org/10.6084/m9.figshare.8851946) as a simple open-access resource.

Keywords: HBV; phylogenetics; reference sequences; whole genome.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
HBV genome lengths of each genotype. Standard genome lengths, and sites with deletions and insertions, are illustrated for genotypes A–J, along with a map of the HBV genome layout. Deletions and insertions are shown relative to genotype A, which is widely used as a numbering reference for HBV. Deletions are shown as white gaps and sites of insertions are indicated in black with triangles above them. All genotypes have a 6 bp deletion in the core (C), relative to genotype A (at nucleotide (nt)2354). Genotypes E and G have a 3 bp deletion in the pre-S1 region (at nt2861) and genotypes D and J have a 33 bp deletion at the start of the pre-S1 region (at nt2854) [2]. Genotype G also has a 33 bp insertion in the core (at nt1903).
Fig. 2.
Fig. 2.
Genomic-length maximum-likelihood phylogeny of all genotype A–I HBV sequences included in analysis (n=2839) after removing highly similar sequences, indicating the number of sequences in each genotype analysed separately (Figs 3–10). Bootstrap support ≥70 after 1000 replicates is given for the deepest branches on the tree. The scale bar indicates the estimated nucleotide substitutions per site. *, a strain known to be from a 14th century skeleton clustering distantly with genotype A, LT992441, was removed from the subsequent analysis. **, KU736915 was identified as a genotype D/E recombinant and removed from the subsequent analysis.
Fig. 3.
Fig. 3.
Genomic-length maximum-likelihood phylogeny of HBV genotype A sequences (n=259). Well-defined clades have been highlighted with coloured dotted lines and reference sequences for each clade indicated (red dots). The proposed reference strain for the genotype, FJ692557, is highlighted with a blue dot. The subgenotype is given where it could be reliably identified. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site. Previous work has confirmed that there are at least five genotype A subgenotypes, although debate continues about whether A3, A4 and A5 should all be considered to be ‘quasi-subgenotype A3’ [9]. Few sequences for subgenotypes *A4 (KM606737) and **A6 (GQ331046) were retained in the study after pairwise analysis. The putative subgenotype A6 has previously been identified in three African-Belgian patients [37].
Fig. 4.
Fig. 4.
Genomic-length maximum-likelihood phylogeny of HBV genotype B sequences. Well-defined clades have been highlighted with coloured dotted lines and reference sequences for each clade indicated (red dots). The proposed reference strain for the genotype, GU815637, is highlighted with a blue dot. The subgenotype is given where it could be reliably identified. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site. An evaluation of the genotype B phylogeny reclassified a number of putative subgenotypes as quasi-B3, with debate continuing on whether or not this should also include B5 [53].
Fig. 5.
Fig. 5.
Genomic-length maximum-likelihood phylogeny of HBV genotype C sequences. Well-defined clades have been highlighted with coloured dotted lines and reference sequences for each clade indicated (red dots). The proposed reference strain for the genotype, GQ377617, is highlighted with a blue dot. The subgenotype is given where it could be reliably identified. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site. We were unable to verify the subgenotype of two genotype C clades, and these have been designated unassigned clades 1 and 2 [unassigned_C (1) and unassigned_C (2), respectively].
Fig. 6.
Fig. 6.
Genomic-length maximum-likelihood phylogeny of HBV genotype D sequences. Well-defined clades have been highlighted with coloured dotted lines and reference sequences for each clade indicated (red dots). The proposed reference strain for the genotype, KC875277, is highlighted with a blue dot. The subgenotype is given where it could be reliably identified. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site. Previous work has indicated that the D3 and D6 strains cluster together and should be classed as a single subgenotype [54].
Fig. 7.
Fig. 7.
Genomic-length maximum-likelihood phylogeny of HBV genotype E sequences. Genotype E sequences do not diverge into distinct subgenotypes. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site. The proposed reference strain for the genotype, GQ161817, is highlighted with a blue dot.
Fig. 8.
Fig. 8.
Genomic-length maximum-likelihood phylogeny of HBV genotype F sequences. Well-defined clades have been highlighted with coloured dotted lines and reference sequences for each clade indicated (red dots). The proposed reference strain for the genotype, HM585194, is highlighted with a blue dot. The subgenotype is given where it could be reliably identified. Bootstrap support ≥70 after 1000 replicates is given. The scale bar indicates the estimated nucleotide substitutions per site.
Fig. 9.
Fig. 9.
Genomic-length maximum-likelihood phylogeny of HBV genotype H sequences. Genotype H sequences do not diverge into distinct subgenotypes. Bootstrap support ≥70 after 1000 replicates is given. The proposed reference strain for the genotype, FJ356715, is highlighted with a blue dot. The scale bar indicates the estimated nucleotide substitutions per site.
Fig. 10.
Fig. 10.
Genomic-length maximum-likelihood phylogeny of HBV genotype I sequences. Well-defined clades have been highlighted with coloured dotted lines and reference sequences for each clade indicated (red dots). The proposed reference strain for the genotype, AB562463, is highlighted with a blue dot. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site.
Fig. 11.
Fig. 11.
Genomic-length maximum-likelihood phylogenetic tree of HBV genotype, subgenotype and clade reference strains identified in Figs 3–10 and listed in Table 1 with accession numbers. The genotype is given in each case and the subgenotype or clade identification is given where possible. Bootstrap support for branches ≥70 after 1000 replicates is indicated. The scale bar indicates the estimated nucleotide substitutions per site. In addition to the references identified in Figs 3–10, genotype A isolate X02763 and genotype D isolate NC_003977.2 have been included in the tree.
Fig. 12.
Fig. 12.
Pairwise distance distribution for the genomic-length sequences of HBV genotypes A, B, C, D, E, F, H and I. Probability densities of pairwise distances for whole-genome sequences of HBV genotypes. Genotypes E, F, H and I are shown on a separate plot from genotypes A–D as they contained smaller number of sequences. Too few sequences were available after filtering for genotypes G and (putative) genotype J to be analysed.

References

    1. Locarnini S, Zoulim F. Molecular genetics of HBV infection. Antivir Ther. 2010;15:3–14. doi: 10.3851/IMP1619. - DOI - PubMed
    1. McNaughton AL, D’Arienzo V, Ansari MA, Lumley SF, Littlejohn M, et al. Insights from deep sequencing of the HBV genome—unique, tiny, and misunderstood. Gastroenterology. 2019;156:384–399. doi: 10.1053/j.gastro.2018.07.058. - DOI - PMC - PubMed
    1. Kramvis A. Genotypes and genetic variability of hepatitis B virus. Intervirology. 2014;57:141–150. doi: 10.1159/000360947. - DOI - PubMed
    1. Torres-Cornejo A, Lauer GM. Hurdles to the development of effective HBV immunotherapies and HCV vaccines. Pathogens and Immunity. 2017;2:102. doi: 10.20411/pai.v2i1.201. - DOI - PMC - PubMed
    1. Pawlotsky J-M, Negro F, Aghemo A, Berenguer M, Dalgard O, et al. EASL recommendations on treatment of hepatitis C 2018. J Hepatol. 2018;69:461–511. doi: 10.1016/j.jhep.2018.03.026. - DOI - PubMed

Publication types