Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 19;10(6):e1004332.
doi: 10.1371/journal.pgen.1004332. eCollection 2014 Jun.

The first endogenous herpesvirus, identified in the tarsier genome, and novel sequences from primate rhadinoviruses and lymphocryptoviruses

Affiliations

The first endogenous herpesvirus, identified in the tarsier genome, and novel sequences from primate rhadinoviruses and lymphocryptoviruses

Amr Aswad et al. PLoS Genet. .

Abstract

Herpesviridae is a diverse family of large and complex pathogens whose genomes are extremely difficult to sequence. This is particularly true for clinical samples, and if the virus, host, or both genomes are being sequenced for the first time. Although herpesviruses are known to occasionally integrate in host genomes, and can also be inherited in a Mendelian fashion, they are notably absent from the genomic fossil record comprised of endogenous viral elements (EVEs). Here, we combine paleovirological and metagenomic approaches to both explore the constituent viral diversity of mammalian genomes and search for endogenous herpesviruses. We describe the first endogenous herpesvirus from the genome of the Philippine tarsier, belonging to the Roseolovirus genus, and characterize its highly defective genome that is integrated and flanked by unambiguous host DNA. From a draft assembly of the aye-aye genome, we use bioinformatic tools to reveal over 100,000 bp of a novel rhadinovirus that is the first lemur gammaherpesvirus, closely related to Kaposi's sarcoma-associated virus. We also identify 58 genes of Pan paniscus lymphocryptovirus 1, the bonobo equivalent of human Epstein-Barr virus. For each of the viruses, we postulate gene function via comparative analysis to known viral relatives. Most notably, the evidence from gene content and phylogenetics suggests that the aye-aye sequences represent the most basal known rhadinovirus, and indicates that tumorigenic herpesviruses have been infecting primates since their emergence in the late Cretaceous. Overall, these data show that a genomic fossil record of herpesviruses exists despite their extremely large genomes, and expands the known diversity of Herpesviridae, which will aid the characterization of pathogenesis. Our analytical approach illustrates the benefit of intersecting evolutionary approaches with metagenomics, genetics and paleovirology.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Phylogenetic and genomic analysis of the tarsier endogenous herpesvirus.
Panel A: DNA polymerase tree showing the placement of the TsyrHVL sequence within the Betaherpesvirinae. Only the lineage leading to the node including TsyrHVL is shown and the rest are collapsed for clarity, and the size of the collapsed clade is arbitrary. Panel B represents the phylogeny reconstructed from a concatenated amino acid alignment of 6 core genes (terminase, large tegument, uracil-DNA glycosylase, kinase, capsid protein and helicase). Unclassified betaherpesviruses are shown as black branches, whereas those belonging to defined genera are indicated in colour. The rooting at Proboscivirus was determined according to the phylogeny in panel A. Numbers at each node in both Panel A and B represent bootstrap support. Panel C shows a schematic of the tarsier sequences mapped to HHV6 as a reference. Orange lines indicate wgs contigs obtained from NCBI and GenBank IDs are annotated. Contig ABRT02391417.1 is represented on both sides, since it consists entirely of the DR region, although it aligns with ABRT02259801.1 with only 5 differences, and both placements are plausible. Blue box indicates the virus' terminal direct repeat (DR) regions. Yellow boxes represent the major internal repeat regions. Because the genomes are so large, it is not feasible to represent the complete coding content. Instead major herpesvirus core blocks (HCB) are indicated (as in reference [5]), and genes that are relevant to discussion points in the main text are also annotated. Abbreviations for Panel A and B are THEVE: Tarsier Herpesvirus Endogenous Viral Element, HHV6A/6B/7/5: human herpesvirus 6A/6B/7/5, MuHV1/2/8: Murid herpesvirus 1/2/8, AoHV1: Aotine herpesvirus 1, SaHV3: Saimiriine herpesvirus 3, PaHV2: Panine herpesvirus 2, CeHV5: Cercopithecine herpesvirus 5, TuHV1, PoCMV: Porcine cytomegalovirus.
Figure 2
Figure 2. Validation of the endogenous status of TsyrHVLs.
Panel A is a close up of contig ABRT02391417.1, showing RepeatMasker-detected repetitive elements, an endogenous retrovirus and the satellite DNA telomeric repeat motif (TTAGGG)n that is characteristic of chromosomally integrated HHV6. A zoomed-in representation of the junction is also depicted, with regions highlighted according to the colour key, as well as a map of primers used for the amplification and location of sequenced fragments. Two independent PCR reactions were run in the first instance using the primer pair F4/R4 and genomic DNA, which was semi-nested in the second round of amplification. A single fragment using primers F2/R4 was obtained from both first-round amplicons, and a larger fragment from primer F3/R4 was possible from one of them. A 1% agarose gel electrophoreses is shown indicating the approximate size of the fragments: F2/R4 amplified a 2,034 bp fragment shown in Lane 1 and 3, while the F3/F4 fragment was 2,277 bp (Lane 2). Each fragment was sequenced using all visible primers (F4 was only possible for the largest amplicon). The coverage map indicates the sequence obtained from each primer, trimmed for quality. The final contig after quality trimming was 2,159 bp, and included 5 nucleotide differences and 2 indels relative to the tarsier genome record. A proportion of these will be true polymorphisms, while others may have resulted from polymerase error during either sequencing or amplification, (in both our fragment and the published sequence). Panel B shows the location of small fragments amplified to confirm the presence of the unique viral region.
Figure 3
Figure 3. Phylogenetic and genomic analysis of the Daubentonia madagascariensis rhadinovirus.
Panel A is a maximum likelihood amino acid phylogeny of DNA polymerase, indicating the subfamily placement of the DmadHVL sequence as a gammaherpesvirus. Numbers at each node represent bootstrap support and only those above 50% are shown. Lineages other than those leading to DmadHVL are collapsed for clarity. In Panel B, the tree shown is a maximum likelihood phylogeny estimated using a concatenated alignment of 6 core genes (terminase, large tegument, uracil-DNA glycosylase, kinase, capsid and helicase). Coloured clades represent the different genera within gamma-2 herpesviruses, and bootstrap support is shown for each node. Panel C shows DmadHVL sequences mapped to Saimiriine herpesvirus 2 (SaHV2) as a guide, and major repeat blocks as well as noteworthy genomic differences and genes discussed in the main text are highlighted in coloured boxes. The green blocks are the FGAM synthase coding sequences, which are found at the termini. The red box annotated as glycoprotein H is presumed to be an assembly error. Pink boxes are discussed genes present in SaHV2, while the yellow ORFs are those found in different viruses. The blue lines indicate the different sequences that are a composite of multiple wgs contigs, the number of which is indicated above each sequence. The composite DmadHVL sequences discussed in the main text are numbered from 1–16 in a left-right direction. The DmadHVL virus genome appears to have a slightly larger region spanning herpes core block (HCB) 3 and HCB4, and so contig 11 is drawn to represent this. The scale of the schematic is approximate. Abbreviations for Panel B are porc2/3: OvHV2: Ovine herpesvirus 2, AlHV1: Alcelaphine herpesvirus 1, Porcine lymphotropic herpesvirus 2/3, EqHV2: Equine herpesvirus 2, RodHV: Rodent herpesvirus peru, MuHV4: Murid herpesvirus 4, AtHV3: Ateline herpesvirus 3, SaHV2: Saimiriine herpesvirus 2, BoHV4: Bovine herpesvirus 4, HHV8: human herpesvirus 8, MaHV5: Macacine herpesvirus 5.
Figure 4
Figure 4. Phylogenetic analyses of Pan paniscus lymphocryptovirus.
In all panels, the trees depicted represent Maximum likelihood phylogenies reconstructed using different gene sets. Node values represent Bootstrap support but only nodes with over 70% are annotated. The panel A tree was reconstructed using a concatenated amino acid alignment of DNA polymerase and glycoprotein B, for which there are many representative taxa in Gammaherpesvirinae. In panel B the tree was reconstructed from a concatenated alignment of 6 core genes (terminase, large tegument, uracil-DNA glycosylase, kinase, capsid and helicase), and in both Panel A and B the gamma-2 lineage is collapsed for clarity. Panel C shows the phylogeny reconstructed using a conserved region of the DNA polymerase gene in order to ascertain the subfamily placement of the PpanHVL sequences. Abbreviations: LCV: Lymphocryptovirus, HHV4: Human herpesvirus 4, HHV4_t2: Human herpesvirus 4 type 2, MaHV4: Macacine herpesvirus 4, CaHV3: Callitrichine herpesvirus 3.
Figure 5
Figure 5. Genomic mapping of Pan Paniscus viral sequences.
The viral sequences found in the Pan paniscus genome appear to represent parts of the P. paniscus LCV1, which had been previously been identified but only partially sequenced. Because of the extreme genetic similarity to human herpesvirus 4 (HHV4) and identical gene set, ORF visualization was possible by aligning the contigs directly to the HHV4 genome. The contigs were separated into panels A–C to for clarity, and a zoomed out layout of the contigs is shown in panel D. In each panel, the blue-boxed arrows indicate the ORFs of HHV4, which are identically positioned in the PpanHVLs. Pink boxes represent repetitive sequence regions. In panel A and C, the dark red circle indicates the HHV4 origins of replication. It is interesting to note the repetitive sequences in regions of HHV4 that correspond to nearly all the edges of the PpanHVL contigs. Assembly algorithms are known to struggle in the reconstruction of low complexity sequences, which strongly suggests that for the P. paniscus LCV1 genome, repetitive sequences are located in the same place as they are in HHV4.
Figure 6
Figure 6. Principal Components Analysis of D. madagascariensis contigs.
Plot of scores from the first 2 principal components that account for 77% of the variance. The variables are: read coverage depth, all four mononucleotide and sixteen dinucleotide frequencies, mean, median, minimum and maximum insert size of reads. Scores were binned in order to more easily view the distribution. Orange points represent the contigs identified as viral, which have been over-plotted to identify where in the distribution the lie. Each hexagon represents a different bin and the size of the internal hexagon represents how ‘full’ that bin is in terms of the number of contigs. The placement of the internal hexagon describes the mean value of scores in that bin. The number of contigs contained within each bin is represented by the following colour scheme: blue with red border: 1–9, turquoise with green border: 10–99, pink with blue border: 100–999, yellow with turquoise border: 1000–9999, grey with pink border: 10,000–99,999, Black with yellow border: 100,000–999,999.

References

    1. Lavergne A, de Thoisy B, Pouliquen J-F, Ruiz-García M, Lacoste V (2011) Partial molecular characterisation of New World non-human primate lymphocryptoviruses. Infect Genet Evol 11: 1782–1789 10.1016/j.meegid.2011.07.017 - DOI - PubMed
    1. McGeoch DJ, Gatherer D, Dolan A (2005) On phylogenetic relationships among major lineages of the Gammaherpesvirinae. J Gen Virol 86: 307–316 10.1099/vir.0.80588-0 - DOI - PubMed
    1. McGeoch DJ, Rixon FJ, Davison AJ (2006) Topics in herpesvirus genomics and evolution. Virus Res 117: 90–104 10.1016/j.virusres.2006.01.002 - DOI - PubMed
    1. McGeoch DJ, Davison AJ, Dolan A, Gatherer D, Sevilla-Reyes EE (2010) Molecular Evolution of the Herpesvirales. In: Domingo E, Holland JJ, editors. The Origin and Evolution of Viruses. Academic Press. 447–475 p. doi:10.1002/9780470688618.taw0208.
    1. Fields BN, Knipe DM, Howley PM (2007) Fields Virology. 5th ed. Philadelphia: Lippincott Williams and Wilkins.

Publication types

LinkOut - more resources