Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Mar 31;81(1):157.
doi: 10.1007/s00018-024-05195-2.

Regulation and function of transposable elements in cancer genomes

Affiliations
Review

Regulation and function of transposable elements in cancer genomes

Michael Lee Jr et al. Cell Mol Life Sci. .

Abstract

Over half of human genomic DNA is composed of repetitive sequences generated throughout evolution by prolific mobile genetic parasites called transposable elements (TEs). Long disregarded as "junk" or "selfish" DNA, TEs are increasingly recognized as formative elements in genome evolution, wired intimately into the structure and function of the human genome. Advances in sequencing technologies and computational methods have ushered in an era of unprecedented insight into how TE activity impacts human biology in health and disease. Here we discuss the current views on how TEs have shaped the regulatory landscape of the human genome, how TE activity is implicated in human cancers, and how recent findings motivate novel strategies to leverage TE activity for improved cancer therapy. Given the crucial role of methodological advances in TE biology, we pair our conceptual discussions with an in-depth review of the inherent technical challenges in studying repeats, specifically related to structural variation, expression analyses, and chromatin regulation. Lastly, we provide a catalog of existing and emerging assays and bioinformatic software that altogether are enabling the most sophisticated and comprehensive investigations yet into the regulation and function of interspersed repeats in cancer genomes.

Keywords: ERVs; LINE-1; Long-read sequencing; Non-coding genome; Retrotransposons; SINE; Viral mimicry.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
Human retrotransposons and their replication cycle. A Domain schematics of the major retrotransposons in the human genome. Thick red arrows depict target site duplications (TSD), a hallmark of retrotransposition. Thin black arrows depict transcription start sites. LINE-1 5′UTR possesses an anti-sense promoter. SVA elements most likely are transcribed by RNAPII. LTR, long terminal repeat. UTR, untranslated region. RNAPII, RNA polymerase II. FLAM, Free Left Alu Monomer. FRAM, Free Right Alu Monomer. VNTR, variable tandem repeat. pA, poly-adenylation signal. TT, T-stretch terminator of RNAPIII. B Key steps of the retrotransposition cycle for LINE-1, Alu/SVA, and HERV. Alu and SVA are non-autonomous and hijack LINE-1 machinery in trans for TPRT. The HERV RT reaction occurs within virus-like particles (VLP) prior to nuclear import and integration. LINE-1-mediated TPRT preferentially targets AT-rich sequences. Major RNA species of LINE-1 are depicted below its transcription reaction schematic. YY1 positions proper LINE-1 TSS selection. 5′-7mG denotes the 7-methylguanosine cap of LINE-1 mRNA. RNP, ribonucleoprotein particle. RT, reverse transcription. TPRT, target-primed reverse transcription
Fig. 2
Fig. 2
TE activity generates genomic variation and is coopted during evolution. TE activity has frequently been coopted for beneficial regulatory functions in genomes during evolution, including chromosome compartmentalization, TAD boundary formation, enhancer activity, and gene regulatory network formation. GRN, gene regulatory network. TF, transcription factor. Dashed line with arrowhead depicts a transposition event. Lightning symbol depicts signaling cues such as cytokines triggering TF binding and activating interferon-stimulated gene (ISG) transcription
Fig. 3
Fig. 3
TE activity can promote and suppress cancers. Major cancer-promoting (blue arrows) and suppressive (orange arrows) roles of TEs. Antigen presentation genes are often epigenetically silenced in cancer cells to evade adaptive immunity. Some cancer types may mutate IFN related genes as an adaptive mechanism to tolerate TEs without inducing an IFN response. The mechanism of cytosolic LINE-1 cDNA synthesis is currently unknown. TSG, tumor-suppressor gene. Caution symbols depict DNA damage. RLR, RIG-I-like Receptors. dsRNA, double stranded RNA. cDNA, complementary DNA. IFN, interferon. DNMTi, DNA methyltransferase inhibitors. NRTI, nucleoside reverse transcriptase inhibitors
Fig. 4
Fig. 4
Computational analysis of TE genomic variation and expression using short-read and long-read sequencing. A TE analysis is challenging because of their high copy number, sequence diversity, and variability across individuals. These problems are exacerbated in cancer with increased polymorphic TE content and structural variation; moreover, somatic TE inserts in the tumor must be distinguished from germline variants. Internal black lines depict nucleotide variants within TEs. B Reference-centric approaches for detecting putative de novo TE insertions based on alignment characteristics of reads spanning the TE insert (split versus discordant reads). The vertical dashed line depicts the breakpoint of an inserted TE (blue). Sequencing reads aligning entirely within TEs often match identically with multiple genomic copies, resulting in poor mappability. R1, read 1. R2, read 2. C Long-read sequencing has improved reference genome assembly, bridging gaps (“NNN”) in reference genomes assembled by short-read technologies. These gaps typically are composed of complex repetitive elements such as tandem repeats or multiple nested TEs (composite)
Fig. 5
Fig. 5
Experimental strategies to detect TE variation, expression, and epigenetics. A Somatic TE insertions in cancer can be detected by whole genome sequencing (WGS) or targeted approaches that enrich for TE sequences (linker ligation PCR versus hybridization capture). NGS, next-generation sequencing. TSD, target site duplication. B TE expression analysis is complicated by multiple potential sources of TE-containing RNAs, particularly for intronic TEs. Specific TE loci often cannot be distinguished with short reads unless containing sufficient unique sequence content (3′ readthrough method). In silico methods can estimate locus-specific TE expression by rescuing multi-mapped reads. With sufficient accuracy, full-length TE long reads can distinguish individual TE loci by virtue of characteristic SNPs as well as identify TE-initiated transcripts versus passive readthrough by the host gene using 5′ end transcription start site (TSS, colored arrowheads) information. E–M, expectation–maximization algorithm. C Cancer genomes frequently undergo DNA hypomethylation during tumorigenesis, hence DNA methylation is commonly measured to assess the epigenetic permissivity of TEs in cancer. Locus-specific methylation can be detected using bisulfite conversion of genomic DNA paired with locus-specific amplicon sequencing. Nanopore sequencing enables direct detection of modified nucleotides, including 5-methylcytosine (5mC), during basecalling

References

    1. Wells JN, Feschotte C. A field guide to eukaryotic transposable elements. Annu Rev Genet. 2020;54:539–561. doi: 10.1146/annurev-genet-040620-022145. - DOI - PMC - PubMed
    1. Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Venter JC, et al. The sequence of the human genome. Science. 2001;291(5507):1304–1351. doi: 10.1126/science.1058040. - DOI - PubMed
    1. Ågren JA, Clark AG. Selfish genetic elements. PLoS Genet. 2018;14(11):e1007700. doi: 10.1371/journal.pgen.1007700. - DOI - PMC - PubMed
    1. McClintock B. Controlling elements and the gene. Cold Spring Harb Symp Quant Biol. 1956;21:197–216. doi: 10.1101/SQB.1956.021.01.017. - DOI - PubMed

Substances