Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 22:13:7.
doi: 10.1186/s12977-015-0232-y.

Classification and characterization of human endogenous retroviruses; mosaic forms are common

Affiliations

Classification and characterization of human endogenous retroviruses; mosaic forms are common

Laura Vargiu et al. Retrovirology. .

Abstract

Background: Human endogenous retroviruses (HERVs) represent the inheritance of ancient germ-line cell infections by exogenous retroviruses and the subsequent transmission of the integrated proviruses to the descendants. ERVs have the same internal structure as exogenous retroviruses. While no replication-competent HERVs have been recognized, some retain up to three of four intact ORFs. HERVs have been classified before, with varying scope and depth, notably in the RepBase/RepeatMasker system. However, existing classifications are bewildering. There is a need for a systematic, unifying and simple classification. We strived for a classification which is traceable to previous classifications and which encompasses HERV variation within a limited number of clades.

Results: The human genome assembly GRCh 37/hg19 was analyzed with RetroTector, which primarily detects relatively complete Class I and II proviruses. A total of 3173 HERV sequences were identified. The structure of and relations between these proviruses was resolved through a multi-step classification procedure that involved a novel type of similarity image analysis ("Simage") which allowed discrimination of heterogeneous (noncanonical) from homogeneous (canonical) HERVs. Of the 3173 HERVs, 1214 were canonical and segregated into 39 canonical clades (groups), belonging to class I (Gamma- and Epsilon-like), II (Beta-like) and III (Spuma-like). The groups were chosen based on (1) sequence (nucleotide and Pol amino acid), similarity, (2) degree of fit to previously published clades, often from RepBase, and (3) taxonomic markers. The groups fell into 11 supergroups. The 1959 noncanonical HERVs contained 31 additional, less well-defined groups. Simage analysis revealed several types of mosaicism, notably recombination and secondary integration. By comparing flanking sequences, LTRs and completeness of gene structure, we deduced that some noncanonical HERVs proliferated after the recombination event. Groups were further divided into envelope subgroups (altogether 94) based on sequence similarity and characteristic "immunosuppressive domain" motifs. Intra and inter(super)group, as well as intraclass, recombination involving envelope genes ("env snatching") was a common event. LTR divergence indicated that HERV-K(HML2) and HERVFC had the most recent integrations, HERVL and HUERSP3 the oldest.

Conclusions: A comprehensive HERV classification and characterization approach was undertaken. It should be applicable for classification of all ERVs. Recombination was common among HERV ancestors.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Some retroviral genetic structures encountered during this work. a Prototypical provirus, with genes and subgenes. Abbreviations are explained in the text, and/or in [35]. dUTPase occurred in either the protease or polymerase genes. b Partial, truncated, provirus. c Provirus with secondary integration, often an LTR in sense or antisense direction. d Recombinant provirus with contributions from different ERVs, in this case a Harlequin element. e Processed pseudogene, i.e. a reverse transcribed genomic retroviral mRNA. Processed pseudogenes were not distinguished from proviruses in the present work
Fig. 2
Fig. 2
Simages. Panel a The principle. A proviral sequence is divided into twentieths, each of which is BLASTed against a reference sequence collection. 1 A homogeneous, canonical, provirus. 2 A heterogeneous, noncanonical, provirus. Panel b A canonical chain. The chain id (“rvnr” in Additional file 1: Table S1), HERV classification, the chromosomal position and LTR divergence (if both LTRs were recognized by ReTe) are shown in the uppermost row. The subsequent three rows depict the RepeatMasker nucleic acids with the highest degree of identity, the next three rows which of the 39 consensus sequences determined in this paper (Additional file 3: List S3) has the highest degree of identity, all per twentieth of the chain. The lowest row depicts the ReTe putein interpretation per twentieth. 5 means 5′LTR, G Gag, R Pro, P Pol, E Env and 3 3′LTR. Panel c Three noncanonical chains containing secondary integrations which left a single LTR inside another retroviral chain. Annotation as in b. Colour is used here and in ensuing panels to distinguish components of mosaic chains. C1: HML4 LTR inside an HML2. LTR5 and HERVK refer to HML2. LTR13 is an HML4 LTR. C2: HERV9 LTR inside a HERVH. LTR12 and HylERV9-LTR are HERV9 LTR equivalents. A small pol piece most similar to HERVE is also present. C3: HML2 inside a HERVH. HylNERVH1 and HylNERVH2 are HERVH equivalents (see Additional file 2: List S2). LTR5 is an HML2 LTR. “0” depicts that no similarity was found with the respective query sequences. Panel d Noncanonical chains with signs of recombination. Annotation as in b. D1: HERV9 chain with a short piece similar to HERVIP at the end of pol and beginning of env. D2: a mosaic HERVE with HERVIP, HERVW and HML10 inside. ReTe recognized mainly one gene, env. As described in the text, this is a common pattern for chains labeled as “Harlequin”. D3: a complex HML3 chain where the RepeatMasker based Simage indicates contributions from six different HMLs. D4: An HML3 chain with short pieces of HML1, HML9/10 and HML8. D5: a complex chain which contains undetermined HML sequences in the end of pol, and whole of env. The differences between the consensus and RepeatMasker results in D3-5 indicate that the HML groups and HERVK families contain microheterogeneities, mainly in env, which sometimes can cause classification confusion. The HML10 consensus contains an HML9 like stretch in pol and an HML8 like stretch in env, which may explain some of the discrepancies between the RepeatMasker and Consensus Simages. HERVK14 = HML1, HERVK = HML2, LTR5 = HML2 LTR, HERVK9 = HML3, MER9 = HML3 LTR, HERVK14C = HML9, HERVK11D = HML7, HERVK11 = HML8
Fig. 3
Fig. 3
Mapping of taxonomic markers on an unrooted consensus maximum likelihood cladogram of the HERV groups and supergroups. At the left, HERV supergroups are shown with the first 13 amino acids of a representative ISD within parenthesis. HSERVIII have no known envelope proteins of their own, symbolized with a question mark. The occurrence of nucleotide bias (High T or A, or low g), predominant number of zinc fingers in Gag, predominant gag;pro and pro;pol frame shift strategy, occurrence of dUTPase and GPATCH domains together with the protease and occurrence of dUTPase and Chromo and/or GPY/F domains in the C terminus of the integrase, are shown. Colour codes for branch names: consensus sequences (con) are magenta, best representatives (bre) are in brown. The Chromo and/or GPY/F reddish fill was weaker for some groups because of inconsistent (HEPSI) or weak fit (HML6)
Fig. 4
Fig. 4
Unrooted phylogram of Pol consensus sequences (“con”, magenta) of canonical and best representatives (“bre”; brown) of some noncanonical proviruses together with reference Pols from GenBank (with Genbank id, black), and previous work by the authors (“2-con” were previous consensus sequences). Pol sequences were aligned with Muscle. A maximum likelihood tree was calculated. The asterisks mark the three supergroups which contain RepBase clades belonging to RepBase group MER4I
Fig. 5
Fig. 5
Unrooted phylograms of Gag, Pro, Pol and Env from the consensus sequences in Additional file 4: List S4, with fewer reference sequences than in Fig. 4. A maximum likelihood tree was calculated from Muscle alignments. The asterisks mark instances of possible Env recombination
Fig. 6
Fig. 6
Retroviral envelopes encountered in hg19. Env subgroup consensuses (see Additional file 4: List S4) and reference envelope proteins were aligned by Muscle. A Maximum Likelihood tree was then produced. Branch names of the subgroup consensuses contain, in this order, taxorder nr, “con”, subgroup name, subgroup average percent identity to consensus for the envputein (if the subgroup had only one member, a 0 is shown), a 13 amino acids subdomain from the ISD (if identified), subgroup average percent identity to consensus for 23 ISD amino acids (Additional file 1: Table S1) and bootstrap value of the relation (percent of 100 bootstraps)
Fig. 7
Fig. 7
Retroviral envelopes with high similarity between Env subgroups. Envelope subgroups (A, B, C, etc) with high intersupergroup similarity are shown interconnected, superimposed on the cladogram of Fig. 3. Significant relations (branches with bootstrap >50) were obtained from neighbour joining (not shown) and maximum likelihood trees (same as in Fig. 6). To avoid cluttering, only intersupergroup relations were shown, except for the HML supergroup, where intergroup relations are presented. Relations shown indicate, but do not prove, an envelope transfer event
Fig. 8
Fig. 8
LTR divergence of frequent HERV groups. LTR divergence as calculated by ReTe is presented as a histogram divided into percent bins, from 0–1 to 39–40 %. A very approximate estimate of age since integration was calculated by multiplying percent divergence with 2.5. It is primarily intended to show the distribution of divergence of prominent HERV groups relative to that of other HERV groups

References

    1. Goff SP. Host factors exploited by retroviruses. Nat Rev Microbiol. 2007;5(4):253–263. doi: 10.1038/nrmicro1541. - DOI - PubMed
    1. Benveniste RE, Todaro GJ. Homology between type-C viruses of various species as determined by molecular hybridization. Proc Natl Acad Sci USA. 1973;70(12):3316–3320. doi: 10.1073/pnas.70.12.3316. - DOI - PMC - PubMed
    1. Benveniste RE, Todaro GJ. Evolution of type C viral genes: evidence for an Asian origin of man. Nature. 1976;261(5556):101–108. doi: 10.1038/261101a0. - DOI - PubMed
    1. Boeke JD, Stoye JP. Retrotransposons, endogenous retroviruses, and the evolution of retroelements. In: Coffin JM, Hughes SH, Varmus HE, editors. retroviruses. New York: Cold Spring Harbor; 1997. - PubMed
    1. Goff SP. Retroviridae: the retroviruses and their replication. In: Knipe D, Howley P, editors. Fields virology 5ed. Philadelpa: Lippincott Williams and Wilkins; 2007.