Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 1;88(17):9529-37.
doi: 10.1128/JVI.00919-14. Epub 2014 Jun 11.

Unfixed endogenous retroviral insertions in the human population

Affiliations

Unfixed endogenous retroviral insertions in the human population

Emanuele Marchi et al. J Virol. .

Abstract

One lineage of human endogenous retroviruses (HERVs), HERV-K(HML2), is upregulated in many cancers, some autoimmune/inflammatory diseases, and HIV-infected cells. Despite 3 decades of research, it is not known if these viruses play a causal role in disease, and there has been recent interest in whether they can be used as immunotherapy targets. Resolution of both these questions will be helped by an ability to distinguish between the effects of different integrated copies of the virus (loci). Research so far has concentrated on the 20 or so recently integrated loci that, with one exception, are in the human reference genome sequence. However, this viral lineage has been copying in the human population within the last million years, so some loci will inevitably be present in the human population but absent from the reference sequence. We therefore performed the first detailed search for such loci by mining whole-genome sequences generated by next-generation sequencing. We found a total of 17 loci, and the frequency of their presence ranged from only 2 of the 358 individuals examined to over 95% of them. On average, each individual had six loci that are not in the human reference genome sequence. Comparing the number of loci that we found to an expectation derived from a neutral population genetic model suggests that the lineage was copying until at least ∼250,000 years ago.

Importance: About 5% of the human genome sequence is composed of the remains of retroviruses that over millions of years have integrated into the chromosomes of egg and/or sperm precursor cells. There are indications that protein expression of these viruses is higher in some diseases, and we need to know (i) whether these viruses have a role in causing disease and (ii) whether they can be used as immunotherapy targets in some of them. Answering both questions requires a better understanding of how individuals differ in the viruses that they carry. We carried out the first careful search for new viruses in some of the many human genome sequences that are now available thanks to advances in sequencing technology. We also compared the number that we found to a theoretical expectation to see if it is likely that these viruses are still replicating in the human population today.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Detection of integrations not in the human reference sequence. (A) Schematic of pipeline for finding loci showing how mapping of trimmed reads is linked to result of RetroSeq analysis. Mapping creates a cluster of trimmed reads that are derived from HK2 loci, which are inside the cluster of RetroSeq anchor reads. In contrast, trimmed reads derived from other regions by chance sequence similarity are scattered around the genome. The next stage is confirmation of integration by BreakAlign analysis. Chr, chromosome. (B) Example of the Integrative Genomics Viewer genome browser (49) screenshot showing evidence for the 4q22.3 locus (from chromosome 4 [chr4], coordinates 9602941 to 9603548). (Top) Mapping of all reads with colored ones representing RetroSeq anchors (see Materials and Methods; the color shows the chromosome on which the mate has been mapped to another HK2 locus in the reference genome); (middle) mapping of trimmed reads, with the coverage at each nucleotide position being shown above the reads. The short overlap representing the 6-nt target site duplication causes a doubling of coverage at these 6 nt, forming the tower in the characteristic submarine-shaped profile of the coverage. (Bottom) RepeatMasker track. In this instance, the HK2 virus has integrated into an existing ERV belonging to another lineage, HERVS71.
FIG 2
FIG 2
Validation of integrations. Edited output of the BreakAlign program showing a few representative chimeric NGS reads that span the integration site of unfixed loci. In each read, part of the sequence is viral (red lowercase nucleotides) and the other part aligns to the reference preintegration sequence shown above (on a black background). For each locus, we have chimeric reads from upstream and downstream flanks of the integration, both of which contain the 5-nt-long or (unless indicated) 6-nt-long target site duplication (TSD). Loci found in TCGA patients that are not shown here are described by Marchi et al. (33).
FIG 3
FIG 3
How chimeric reads result from ERV integration. (A to D) A guide to interpretation of outputs by use of locus 5q12.3 as an example. After reverse transcription, viral double-stranded DNA (red) is integrated into the chromosome. The viral integrase enzyme makes a staggered cut, typically of 6 nt, into which the viral DNA is inserted. DNA repair of the now single-stranded DNA on either side of the integration produces six identical nucleotides (the target site duplication) flanking the virus. (E) However, in some cases the virus has integrated in reverse orientation, and an example of where this has occurred is shown for locus 1p21.1. Note the changed viral sequence.
FIG 4
FIG 4
Comparison of the observed and expected numbers of loci. The number of loci in the 26 TCGA patients predicted by the genetic drift model is shown. Along the x axis are the expectations assuming either that the rate of copying until the present day is constant (the date of extinction is year 0) or that the copying of loci ceased at different dates in the last half million years. The red line across the figure shows the observed number (n = 13). The boxes show the medians, interquartile ranges, and the most extreme values from 10,000 replicates.

References

    1. Dewannieux M, Heidmann T. 2013. Endogenous retroviruses: acquisition, amplification and taming of genome invaders. Curr. Opin. Virol. 3:646–656. 10.1016/j.coviro.2013.08.005 - DOI - PubMed
    1. Belshaw R, Pereira V, Katzourakis A, Talbot G, Pačes J, Burt A, Tristem M. 2004. Long-term reinfection of the human genome by endogenous retroviruses. Proc. Natl. Acad. Sci. U. S. A. 101:4894–4899. 10.1073/pnas.0307800101 - DOI - PMC - PubMed
    1. Mayer J, Blomberg J, Seal RL. 2011. A revised nomenclature for transcribed human endogenous retroviral loci. Mobile DNA 2:7. 10.1186/1759-8753-2-7 - DOI - PMC - PubMed
    1. Subramanian RP, Wildschutte JH, Russo C, Coffin JM. 2011. Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses. Retrovirology 8:90. 10.1186/1742-4690-8-90 - DOI - PMC - PubMed
    1. Voisset C, Weiss RA, Griffiths DJ. 2008. Human RNA “rumor” viruses: the search for novel human retroviruses in chronic disease. Microbiol. Mol. Biol. Rev. 72:157–196. 10.1128/MMBR.00033-07 - DOI - PMC - PubMed

Publication types