Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 19;13(1):7115.
doi: 10.1038/s41467-022-34810-8.

Transposable element-mediated rearrangements are prevalent in human genomes

Affiliations

Transposable element-mediated rearrangements are prevalent in human genomes

Parithi Balachandran et al. Nat Commun. .

Abstract

Transposable elements constitute about half of human genomes, and their role in generating human variation through retrotransposition is broadly studied and appreciated. Structural variants mediated by transposons, which we call transposable element-mediated rearrangements (TEMRs), are less well studied, and the mechanisms leading to their formation as well as their broader impact on human diversity are poorly understood. Here, we identify 493 unique TEMRs across the genomes of three individuals. While homology directed repair is the dominant driver of TEMRs, our sequence-resolved TEMR resource allows us to identify complex inversion breakpoints, triplications or other high copy number polymorphisms, and additional complexities. TEMRs are enriched in genic loci and can create potentially important risk alleles such as a deletion in TRIM65, a known cancer biomarker and therapeutic target. These findings expand our understanding of this important class of structural variation, the mechanisms responsible for their formation, and establish them as an important driver of human diversity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Generating confident TEMR callsets and breakpoint annotations across human genomes.
a TEMR callset summary (dark slice) by TE type, SV type and TE orientation. TE, transposable element; SV, structural variant; MEI, mobile element insertion, TEMR, transposable element-mediated rearrangement; LINE; long interspersed nuclear element. b A diagram of TEMR structures showing distinct TEs (black and blue solid arrows) in the reference and a recombined TE in the sample (black and blue mixed arrows). The breakpoint junction in the sample is indicated by the dashed red line. Deletion and duplication TEs are largely mediated by TEs in the same orientation (solid green and purple), and all inversions are mediated by TEs in opposite orientation (hatched green and purple). c Median TEMR sizes differ by type of TE present at breakpoints across different SV types (Alu TEMRs vs LINE-1 TEMRs; deletion: n = 361 vs 84, duplication: n = 29 vs 4, and inversion: n = 7 vs 8). A two-sided Welch’s t-test was used to calculate the p-value. n.s = not significant. d A 2,210 bp deletion TEMR between two Alu repeats in direct orientation. Top panel: UCSC genome browser image showing the deletion with breakpoints in an AluSp and an AluSg. Bottom panel: Breakpoint reconstruction of the assembled deletion (middle, NA19240) against Alu consensus sequences (top and bottom) identifies a 20 bp breakpoint microhomology (red). REF, reference genome (GRCh38).
Fig. 2
Fig. 2. Identifying mechanistic signatures of TEMRs.
a TEMR events are classified by breakpoint characteristics guided by TE consensus sequences. For homologous breakpoints, the chimeric TE resulting from the TEMR event must reconstruct a full TE with microhomology at the breakpoint. HR, homologous recombination; NHE, non-homologous repair; TE, transposable element; SV, structural variant; MEI, mobile element insertion, TEMR, transposable element-mediated rearrangement; LINE; long interspersed nuclear element. b An example of TEMR-HR (top). Breakpoint junction of 1,294 bp TEMR deletion in NA19240 (middle) and alignment between flanking Alu elements to a Alu consensus sequence (bottom). c An example of TEMR-NHE (top). Breakpoint junction of 312 bp TEMR deletion in HG00514 (middle) and alignment between flanking Alu elements to a Alu consensus sequence (bottom). REF, reference genome (GRCh38). d The breakpoint microhomology distribution differs between NHE (orange) and HR (blue) TEMRs. e top: Microhomology GC content distribution for Alu TEMRs (dark green, n = 330), reference Alu elements (light green, n = 1,181,072), LINE-1 TEMRs (dark purple, n = 38), reference LINE-1s (light purple, n = 962,085) and the full human genome reference (HGR, gray). Bottom: Average GC content of TEMR breakpoint microhomologies for Alu (green) and LINE-1 (purple) TEMRs. Microhomologies were restricted to 5+ bp for this analysis. A two-sided Welch’s t-test was used to calculate the p-value. n.s, not significant.
Fig. 3
Fig. 3. Features and complexities of TEMRs.
a Median similarity between TEs at TEMR breakpoints differs by TE type and mechanism (TEMR-HR: 354 Alu and 36 LINE-1 and TEMR-NHE: 43 Alu and 60 LINE-1). A two-sided Welch’s t-test was used to calculate the p-value. HR, homologous recombination; NHE, non-homologous repair; TE, transposable element; SV, structural variant; MEI, mobile element insertion, TEMR, transposable element-mediated rearrangement; LINE; long interspersed nuclear element. b Distribution of the breakpoint microhomology along the Alu consensus sequence. Alu elements consist of a left monomer (indigo), right monomer (green), RNA polymerase III promoter regions (gray A-Box, B-Box and A′-Box) and Adenosine Rich region (AR, purple). c Example of a 988 bp TEMR inversion (INV) with additional complex breakpoints (38 bp and 56 bp deletion (DEL) indicated in pink shaded sections) implicating replication-based mechanisms of variant formation. REF, reference genome (GRCh38).
Fig. 4
Fig. 4. TEs mediate multi-copy CNVs (mCNVs).
a 6 kbp multi-copy event (duplication and triplication) with a smaller 2.2 kbp deletion (red) in a subset of copies. b Reconstruction of the AluS-mediated mCNV TEMR breakpoint surrounding each 6 kbp copy shows 1 bp microhomology (red). Arrows (blue and yellow) indicate the source of the sequence in reference (REF) and sample (ALT). The dashed red line is indicative of the breakpoint junction. c Reconstruction of the 2 kbp inner-copy deletion shows 59 bp of near-perfect homology (red bases). d Copy number status of the individuals containing the triplication found using ddPCR, read-depth analysis and assembly. TEMR, transposable element-mediated rearrangement; REF, reference genome (GRCh38).
Fig. 5
Fig. 5. TEMRs disproportionately affect genes.
a All 397 Alu TEMRs by variant type (shape) and Alu density (shading). b TEMRs disproportionately intersect genes (pie chart) affecting introns, exons, and gene-proximal regions that are often enriched with regulatory elements. c Regulatory and coding regions affected by TEMRs. Regulatory annotations were retrieved from ENCODE cCREs, coding regions and TFBS predictions from Ensembl Variant Effect Predictor (VEP). TFBS, transcription factor binding site; cCREs, Candidate Cis-Regulatory Elements; K4m3, DNase-H3K4me3; enhD, distal enhancer-like signature; enhP, proximal enhancer-like signature; prom, promoter-like signature; UTR, untranslated region; ENCODE, the encyclopedia of DNA elements; TEMR, transposable element-mediated rearrangement. d Example of TEMR deleting a stop codon in TRIM65 (bottom).

References

    1. Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000;16:418–420. doi: 10.1016/S0168-9525(00)02093-X. - DOI - PubMed
    1. Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-3.0. (1996-2010).
    1. de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7:e1002384. doi: 10.1371/journal.pgen.1002384. - DOI - PMC - PubMed
    1. Wheeler TJ, et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 2013;41:D70–D82. doi: 10.1093/nar/gks1265. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances