Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 27;11(1):veaf012.
doi: 10.1093/ve/veaf012. eCollection 2025.

Sequence analysis of the hepatitis D virus across genotypes reveals highly conserved regions amidst evidence of recombination

Affiliations

Sequence analysis of the hepatitis D virus across genotypes reveals highly conserved regions amidst evidence of recombination

Shruti Chowdhury et al. Virus Evol. .

Abstract

Sequence diversity of the hepatitis D virus (HDV) may impact viral clearance, contributing to the development of chronic infection. T-Cell-induced selection pressure and viral recombination can induce diversity throughout the viral genome including coding and noncoding regions, with the former potentially impacting viral pathogenicity and the latter exerting regulatory functions. Here, we aim to assess sequence variations of the HDV genome within and across HDV genotypes. Sequences from 721 complete HDV genomes and 793 large hepatitis D antigen (L-HDAg) regions belonging to all eight genotypes and published through December 2023 were compiled. Most retrieved sequences belonged to Genotype 1, whereas for Genotype 8, the fewest sequences were available. Alignments were conducted using Clustal Omega and Multiple Alignment using Fast Fourier Transform. Phylogeny was analysed using SplitsTree4, and recombination sites were inspected using Recombination Detection Program 4. All reported sequences were aligned per genotype to retrieve consensus and reference sequences based on the highest similarity to consensus per genotype. L-HDAg alignments of the proposed reference sequences showed that not only conserved but also highly variable positions exist, which was also reflected in the epitope variability across HDV genotypes. Importantly, in silico binding prediction analysis showed that CD8+ T-cell epitopes mapped for Genotype 1 may not bind to major histocompatibility complex class I when examining their corresponding sequence in other genotypes. Phylogenetic analysis showed evidence of recombinant genomes within each individual genotype as well as between two different HDV genotypes, enabling the identification of common recombination sites. The identification of conserved regions within the L-HDAg allows their exploitation for genotype-independent diagnostic and therapeutic strategies, while the harmonized use of the proposed reference sequences may facilitate efforts to achieve HDV control.

Keywords: epitopes; genotypes; hepatitis D virus; recombination; reference sequences.

PubMed Disclaimer

Conflict of interest statement

S.C., C.J., D.P.D. and H.K. declare no conflicts of interest. H.W. has research grants from Abbott and Biotest, is on the speakers’ bureau for Biotest and Gilead, and is on the consulting or advisory board for Abbott, Bristol-Myers-Squibb, F. Hoffmann-La Roche, Gilead, GlaxoSmithKline, Janssen, Roche, and Vir Biotechnology. L.S. reports lecture honoraria and personal fees from Falk Pharma e.V., Gilead, and Roche and travel support from AbbVie and Gilead.

Figures

Figure 1.
Figure 1.
Geographic distribution of HDV genotypes across the world. Compilation of all countries where complete HDV genome and L-HDAg sequences were collected from until December 2023 (turquoise). The length of the bar graphs represents the amount of HDV genome sequences collected for each HDV genotype. Numbers in circles show the genotype. Two Genotype 1 genomic sequences have unknown origin and are thus not accounted for in the map.
Figure 2.
Figure 2.
Phylogenetic segregation of full-length HDV genome sequences and their evolutionary descent. (a) Circular phylogenetic tree of 721 full-length HDV genome sequences. The sequences are colour-coded based on the genotypes. Circular dots indicate nodes of the phylogenetic tree. Bootstrapping values are indicated by symbols adjacent to the tree nodes—moderately supported: 50%–70%, well supported 70%–90%, very well supported: >90%. (b) Phylogenetic tree displaying the most representative reference sequences of the HDV genome. The sequence for Genotype 3 is reverse complementary to the published antigenomic sequence KF786346. The numbers over branches indicate the branch lengths with the tree scales indicating the number of nucleotide substitutions per site.
Figure 3.
Figure 3.
Sequence alignment of the L-HDAg protein reference and consensus sequences showing amino acid conservation. Regions highlighted in grey have a conservation of 90% or higher. Experimentally confirmed CD8+ T-cell epitopes are shown in dark grey boxes on top. (a) Amino acid conservation across the L-HDAg protein of the eight proposed genomic reference sequences with the original NCBI reference protein NP_597693.2 for Genotype 1. (b) Regions of similarity are shown across the L-HDAg consensus sequences obtained for each HDV genotype. The original NCBI reference protein NP_597693.2 for Genotype 1 has been used for comparison. Abbreviation: R.P., reference protein.
Figure 4.
Figure 4.
Recombination across HDV genotypes. (a, left top) Network tree of 721 complete HDV genome sequences built using SplitsTree4 and published until December 2023. The evolutionary relationship and segregation into the eight HDV genotypes are shown. All published complete HDV genome sequences of the individual genotypes were used to construct the remaining trees. The scale bar of the tree on the top left indicates 0.1 nucleotide substitutions per site, while the remaining scale bars correspond to 0.01 substitutions per site. (b) Recombination analysis carried out on the eight proposed reference sequences using RDP4. (c) Recombination analysis of the individual HDV genotypes. Six representative sequences have been shown exemplarily for each genotype. Recombination events shown have high degrees of confidence with RDP4. Numbers below the sequences indicate the start and end positions of the recombination breakpoints in the genome. Recombination events involving similar sequences are indicated by RDP4 as unknown to avoid misidentification of the parent sequence. Abbreviation: GN, genotype.
Figure 4.
Figure 4.
Recombination across HDV genotypes. (a, left top) Network tree of 721 complete HDV genome sequences built using SplitsTree4 and published until December 2023. The evolutionary relationship and segregation into the eight HDV genotypes are shown. All published complete HDV genome sequences of the individual genotypes were used to construct the remaining trees. The scale bar of the tree on the top left indicates 0.1 nucleotide substitutions per site, while the remaining scale bars correspond to 0.01 substitutions per site. (b) Recombination analysis carried out on the eight proposed reference sequences using RDP4. (c) Recombination analysis of the individual HDV genotypes. Six representative sequences have been shown exemplarily for each genotype. Recombination events shown have high degrees of confidence with RDP4. Numbers below the sequences indicate the start and end positions of the recombination breakpoints in the genome. Recombination events involving similar sequences are indicated by RDP4 as unknown to avoid misidentification of the parent sequence. Abbreviation: GN, genotype.
Figure 5.
Figure 5.
Long-read sequencing confirms the presence of recombination within HDV sequences. (a) Multiple sequence alignment conducted on standardized randomly generated full-length HDV sequences from sequencing run SRR15678458. Twenty representative sequences from the hypervariable region of HDV have been shown with their sequence identifiers. (b) Recombination analysis was conducted using RDP4 on the randomly sampled sequences from sequencing run SRR15678458. Ten representative sequences are shown with recombination events displaying high degrees of confidence. The recombination events falling in the CD8+ T-cell epitopes of the HDAg-coding site of the genome are highlighted. Numbers denote the breakpoints of the recombinant sequences.

References

    1. Bahoussi AN, Wang P-H, Guo -Y-Y et al. Global distribution and natural recombination of hepatitis D virus: implication of Kyrgyzstan emerging HDVs in the clinical outcomes. Viruses 2022;14:1467. - PMC - PubMed
    1. Brown DW. Threat to humans from virus infections of non‐human primates. Rev Med Virol 1997;7:239–46. - PubMed
    1. Casey JL, Bergmann KF, Brown TL et al. Structural requirements for RNA editing in hepatitis delta virus: evidence for a uridine-to-cytidine editing mechanism. Proc Natl Acad Sci USA 1992;89:7149–53. doi: 10.1073/pnas.89.15.7149 - DOI - PMC - PubMed
    1. Chao M, Wang T-C, Lin -C-C et al. Analyses of a whole-genome inter-clade recombination map of hepatitis delta virus suggest a host polymerase-driven and viral RNA structure-promoted template-switching mechanism for viral RNA recombination. Oncotarget 2017;8:60841–59. doi: 10.18632/oncotarget.18339 - DOI - PMC - PubMed
    1. Charre C, Regue H, Dény P et al. Improved hepatitis delta virus genome characterization by single molecule full‐length genome sequencing combined with VIRiONT pipeline. J Med Virol 2023;95:e28634. doi: 10.1002/jmv.28634 - DOI - PubMed