Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Dec 1:7:24.
doi: 10.1186/s13100-016-0080-x. eCollection 2016.

Endogenous retroviral promoter exaptation in human cancer

Affiliations
Review

Endogenous retroviral promoter exaptation in human cancer

Artem Babaian et al. Mob DNA. .

Abstract

Cancer arises from a series of genetic and epigenetic changes, which result in abnormal expression or mutational activation of oncogenes, as well as suppression/inactivation of tumor suppressor genes. Aberrant expression of coding genes or long non-coding RNAs (lncRNAs) with oncogenic properties can be caused by translocations, gene amplifications, point mutations or other less characterized mechanisms. One such mechanism is the inappropriate usage of normally dormant, tissue-restricted or cryptic enhancers or promoters that serve to drive oncogenic gene expression. Dispersed across the human genome, endogenous retroviruses (ERVs) provide an enormous reservoir of autonomous gene regulatory modules, some of which have been co-opted by the host during evolution to play important roles in normal regulation of genes and gene networks. This review focuses on the "dark side" of such ERV regulatory capacity. Specifically, we discuss a growing number of examples of normally dormant or epigenetically repressed ERVs that have been harnessed to drive oncogenes in human cancer, a process we term onco-exaptation, and we propose potential mechanisms that may underlie this phenomenon.

Keywords: Alternative promoter; Cancer; Endogenous retrovirus; Epigenetics; Exaptation; Gene regulation; Long terminal repeat; Retrotransposon; Transcription.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Examples of Onco-exaptation. Gene models of known TE-derived promoters expressing downstream oncogenes and listed in Table 2. Legend is shown at the top. a 6 kb upstream of CSF1R, a THE1B LTR initiates transcription and contains a splice donor site which joins to an exon within a LINE L1MB5 element and then into the first exon of CSF1R. The TE-initiated transcript has a different, longer 5’ UTR than the canonical transcript but the same full-length protein coding sequence. b An LOR1a LTR initiates transcription and splices into the canonical second exon of IRF5 that contains the standard translational initiation site (TIS) to produce a full-length protein. There also is a novel second exon which is non-TE derived which is incorporated into a minor isoform of LOR1a-IRF5. c Within the canonical intron 2 of the proto-oncogene MET, a full length LINE L1PA2 element initiates transcription (anti-sense to itself), splicing through a short exon in a SINE MIR element and into the third exon of MET. The first TIS of the canonical MET transcript is 14 bp into exon 2, although an alternative TIS exists in exon 3, which is believed to also be used by the L1-promoterd isoform. d An LTR16B2 element in intron 19 of the ALK gene initiates transcription and transcribes into the canonical exon 20 of ALK. An in-frame TIS within the 20th exon results in translation of a shortened oncogenic protein containing only the intra-cellular tyrosine kinase domain, but lacking the transmembrane and extracellular receptor domains of ALK. e There are two TE-promoted isoforms of ERBB4, the minor variant initiates in an MLT1C LTR in the 12th intron and the major variant initiates in a MLT1H LTR in the 20th intron. Both isoforms produce a truncated protein, although the exact translation start sites are not defined. f In the third exon of SLCO1B3, two adjacent partly full-length HERV elements conspire to create a novel first exon. Transcription initiates in the anti-sense orientation from an LTR7 and transcribes to a sense-oriented splice donor in an adjacent MER4C LTR, which then splices into the fourth exon of SLCO1B3, creating a smaller protein. g An LTR2 element initiates anti-sense transcription (relative to its own orientation) and splices into the native second exon of FABP7. The LTR-derived isoform has a non-TE TIS and splice donor which creates a different N-terminal protein sequence of FABP7
Fig. 2
Fig. 2
a UCSC Genome Browser view (hg19) of a portion of the human ALK gene. ALK exon 20 (large blue box) and a part of the upstream intron are shown, with direction of transcription from right to left. The LTR16B2 alternative promoter shown in the Repeatmasker track as an orange box and the 25 bp region of clustered TSSs in melanoma cells, identified using 5’ RACE by Weiser et al. [38], is shown as a green box The CAGE track above is from the Fantom5 project [128], with transcriptional direction indicated with a blue arrow. Most CAGE tags are from monocyte-derived macrophages and endothelial progenitor cells. b UCSC Genome Browser view (hg19) of the region encompassing the SAMMSON lncRNA, which plays an oncogenic role in melanoma [161]. The LTR1A2 promoter is indicated in the Repeatmasker track as an orange box. The ChIP-Seq track for SOX10 was created from a dataset (NCBI Gene Expression Omnibus: GSE61967) generated by Laurette et al. [225] in the 501Mel melanoma cell line
Fig. 3
Fig. 3
Gene models of select lncRNAs initiating within LTRs that are involved in oncogenesis. a A solitary LTR12C element initiates SChLAP1, a long inter-genic non-coding RNA. b The 5’ LTR7 of a full-length HERVH element initiates the lncRNA ROR, with an exon partially incorporating internal ERV sequence. c The HOST2 lncRNA is completely derived from components of a Harlequin (or HERV-E) endogenous retrovirus and its flanking LTR2B. d Anti-sense to the AFAP1 gene, a THE1A LTR initiates transcription of the lncRNA AFAP1-AS1. The second exon of AFAP1-AS1 overlaps exons 14–16 of AFAP1, possibly leading to RNA interference of the gene
Fig. 4
Fig. 4
De-repression model for onco-exaptation. In the normal or pre-malignant state TEs (grey triangles) are largely silenced across the genome. There is low transcriptional activity to produce long non-coding RNA (orange box), or express coding genes in the case of evolutionary exaptations (not shown). The example proto-oncogene (green box) is under the regulatory control of its native, restrictive promoter. During the process of transformation and/or oncogenesis, a change in the molecular state of the cell occurs leading to loss of TE repressors (black circles), i.e. DNA hypomethylation, loss of transcriptional or epigenetic repressive factors. The change could also be accompanied by a change/gain in activating factor activities (red and purple shapes). Together these de-repression events result in higher TE promoter activity (orange triangles) and more TE-derived transcripts based on the factors that become deregulated. Oncogenic activation of proto-oncogenes is a consequence of a particular molecular milieu that arises in the cancerous cells
Fig. 5
Fig. 5
Epigenetic evolution model for onco-exaptation. In the starting cell population there is a dispersed and low/noisy promoter activity at TEs (colored triangles) from a set of transcriptionally permissive TEs (grey triangles). TE-derived transcript expression is low and variable between cells. Some transcripts are more reliably measurable (orange box). Clonal tumor evolutionary forces change the frequency and expression of TE-derived transcripts by homogenizing epialleles and use of TE promoters (highlighted haplotype). A higher frequency of ‘active’ TE epialleles at a locus results in increased measurable transcripts initiating from that position. TE epialleles that promote oncogenesis, namely onco-exaptations, can be selected for and arise multiple times independently as driver epialleles, in contrast to the more dispersed passenger epialleles, or “nonaptations”

References

    1. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. - PubMed
    1. Jurka J, Kapitonov VV, Kohany O, Jurka MV. Repetitive Sequences in Complex Genomes: Structure and Evolution. Annu Rev Genomics Hum Genet. 2007;8(1):241–259. - PubMed
    1. Brosius J, Gould SJ. On "genomenclature": a comprehensive (and respectful) taxonomy for pseudogenes and other "junk DNA". Proc Natl Acad Sci. 1992;89(22):10706–10710. - PMC - PubMed
    1. Gould SJ, Vrba ES. Exaptation-A Missing Term in the Science of Form. Paleobiology. 1982;8(1):4–15.
    1. Feschotte C, Gilbert C. Endogenous viruses: insights into viral evolution and impact on host biology. Nat Rev Genet. 2012;13(4):283–296. - PubMed