Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Nov 20:2024.11.11.623112.
doi: 10.1101/2024.11.11.623112.

Structures of vertebrate R2 retrotransposon complexes during target-primed reverse transcription and after second strand nicking

Affiliations

Structures of vertebrate R2 retrotransposon complexes during target-primed reverse transcription and after second strand nicking

Akanksha Thawani et al. bioRxiv. .

Update in

Abstract

R2 retrotransposons are model site-specific eukaryotic non-LTR retrotransposons that copy-and-paste into gene loci encoding ribosomal RNAs. Recently we demonstrated that avian A-clade R2 proteins achieve efficient and precise insertion of transgenes into their native safe-harbor loci in human cells. The features of A-clade R2 proteins that support gene insertion are not characterized. Here, we report high resolution cryo-electron microscopy structures of two vertebrate A-clade R2 proteins, avian and testudine, at the initiation of target-primed reverse transcription and one structure after cDNA synthesis and second strand nicking. Using biochemical and cellular assays we discover the basis for high selectivity of template use and unique roles for each of the expanded A-clade zinc-finger domains in nucleic acid recognition. Reverse transcriptase active site architecture is reinforced by an unanticipated insertion motif in vertebrate A-clade R2 proteins. Our work brings first insights to A-clade R2 protein structure during gene insertion and enables further improvement and adaptation of R2-based systems for precise transgene insertion.

PubMed Disclaimer

Conflict of interest statement

Competing interests: K.C. is an equity holder and scientific advisor for Addition Therapeutics, Inc., using a retrotransposon-based genome engineering technology. K.C. and B.V.T. are listed inventors on patent applications filed by University of California, Berkeley related to the PRINT platform.

Figures

Fig. 1.
Fig. 1.. TPRT and PRINT activities and cryo-EM structures of A-clade R2 RNPs initiating TPRT.
(a) Schematic of biochemical steps during DNA insertion. (b) Phylogenetic analysis of R2p from the A-clade (birds, turtle, red flour beetle) and D-clade (silk moth and fruit fly) characterized in this and previous work (17, 20). Tree branch length is indicative of substitutions per aligned site. (c) Denaturing PAGE of TPRT reaction products. Orange triangles indicate expected TPRT product lengths for copying a single full-length template (TPRT cDNA). Multiple templates may also be copied in series (template jumping products). R2Pm and R2Tg proteins were assayed with annealed rDNA target site oligonucleotides and different template RNAs, each with an R5 3′ tail: Gf98, Pm112, Bm3. (d) PRINT assay schematic. An mRNA encoding R2Pm or R2Tg protein is transfected with an engineered template RNA comprised of a 5′ module (5′M), modified CMV promoter (PRO), GFP ORF, polyadenylation signal (PA), and 3′ module (3′M) with a 3′tail containing rRNA and A22. (e) PRINT assays with 2-RNA transfection of the R2Pm or R2Tg mRNA and an engineered template RNA with either Gf3 or Pm3 followed by R4A22. Note the log-scale y-axis. (f-g) At top, domains of A-clade R2Pm and R2Tg are illustrated with amino acid numbering; abbreviations given in the text. Cryo-EM density of R2Pm (f) or R2Tg (g) first strand synthesis complex assembled with rDNA target site and either Gf3 full-length 3′UTR RNA (f) or Gf98 RNA (g) is shown and colored by domain. (h-i) Ribbon diagrams of R2Pm (h) or R2Tg (i) first strand synthesis complex structure colored by domains.
Fig. 2.
Fig. 2.. Protein and DNA recognition of R2 3′UTR RNA.
(a) Schematic of direct interactions between R2Pm protein, rDNA target site, and 3′UTR RNA in a TPRT initiation complex. Color scheme is consistent with Figure 1. Solid navy lines denote direct hydrogen bonds with the nucleobases or ribonucleobases, while dashed navy lines represent hydrogen bonds with the phosphate backbone or sugars. Solid mustard lines denote pi-stacking contacts with the nucleobases or ribonucleobases. Black circles represent base-pairs in DNA duplex; RNA-DNA or RNA-RNA base-pairing is indicated by apposition. DNA numbering (green and gray strands) is negative upstream or positive downstream of the first strand nick. RNA numbering (red strand) is from the start of Gf3. (b-c) Recognition of the 3′UTR RNA involves the NTE −1, Thumb, Linker and ZnF3 domains. (b) Base-specific hydrogen bonds between bases G-256 and A-258 in the hinge region of 3′UTR RNA and side chains within the Thumb and Linker domains in R2Pm. (c) ZnF3 domain from R2Pm and R2Tg contacts the pseudoknot of 3′UTR RNA. (d) Side chains in ZnF3 make base-specific hydrogen bonds: R2Pm with G-236. R2Pm ZnF3 also makes a contact with the phosphate backbone of base G-237 at the junction of hinge and pseudoknot and R2Tg’s ZnF3 with the phosphate backbone of base C-253. The helix segmentation is an artifact of automated secondary structure assignment. Here and in subsequent figure panels, heteroatom representation has oxygens in red and nitrogens in blue. (e) Base-specific hydrogen bonds between pseudoknot bases and a bases in a single-stranded region of the second strand DNA. (f) PRINT assays using mRNA encoding R2Tg and template RNA with 3′ module Gf98, or a variant Gf98, and R4A22 3′ tail. Base substitutions are numbered according to their position in Gf3, as annotated in (a), with specific mutations described in the main text.
Fig. 3.
Fig. 3.. Protein recognition of the target DNA and N-terminal domain requirements for TPRT and PRINT.
(a) RLE and ZnK domains surrounding the nicked first strand and single-stranded second strand are illustrated for the R2Pm complex. (b) The motif 6a loop within the RT domain is shown protruding into a distortion in target DNA. (c) Configuration on target DNA of the N-terminal DNA binding domains: the three ZnF and the Myb domain for A-clade R2Pm and R2Tg are compared with the single ZnF and Myb in D-clade R2Bm. (d) Base-reading hydrogen bonds between ZnF2 and the target DNA proximal to the nick site. (e) The unstructured R2Pm Spacer and its interaction with the RT and NTE 0 domains are depicted. (f) Denaturing PAGE of TPRT reaction products with wild-type R2Tg, R2Za, R2Pm and chimeric proteins: R2Pm with the N-terminus (Spacer, Myb, and three ZnFs) from R2Tg (NTg) or R2Za (NZa), R2Pm with ZnF3–2 domains from R2Tg (ZFTg), R2Pm with Spacer from R2Tg (spacTg). Gf68 RNA with R5 3′ tail was used for all assays. Different regions of the same gel are shown, with first strand DNAs and second strand DNAs imaged separately using different 5′ dye. (g) PRINT assays using mRNA encoding R2Pm or the chimeras described in (f). The template RNA 3′ module was Gf3 followed by R4A22.
Fig. 4.
Fig. 4.. A C-terminal insertion in A-clade R2p.
(a) The CTI is rendered in yellow against the RT and Linker domains and RNA:cDNA duplex. The shorter loop present in R2Bm is shown for comparison. (b) Side chains of the conserved EWE motif that anchors the CTI to the RT are displayed for R2Pm. (c) Denaturing PAGE of TPRT reaction products with wild-type R2Tg, R2Tg ΔCTI (CTI truncation) mutant, wild-type R2Pm and R2Pm ΔCTI mutant. Gf68 RNA was synthesized with a variable length of the 3′ tail that base-pairs to target site primer: 0, 3, 4, 5, 8 and 12 nt. Different regions of the same gel are shown, with first strand DNAs and second strand DNAs imaged separately using different 5′ dye. (d) PRINT assays were performed by 2-RNA transfection of the indicated R2p mRNA and template RNA with Gf3 followed by R4A22.
Fig. 5.
Fig. 5.. Biochemical activity and cryo-EM structure of A-clade R2 retrotransposon during second strand nicking.
(a) Denaturing PAGE of target site nicking and TPRT reaction products from assays using wild-type R2Tg or its RTD and END variants. Gf68 RNA with R5 was used as template. Different regions of the same gel are shown, with first strand DNAs and second strand DNAs imaged separately using different 5′ dyes. Small triangle (mustard) indicates TPRT cDNA. (b) Nucleic acid substrate design to capture a post-TPRT structure for an R2Pm complex. 2D class averages from cryo-EM analysis are shown with inferred range of positions of RNA:cDNA duplex exiting the protein density. (c) Cryo-EM density and ribbon diagram of R2Pm second strand nicked complex assembled, colored by domains. (d) Comparison of upstream target site DNA position in the R2Pm first strand synthesis complex versus second strand nicked complex relative to the R2Pm (NTE to RLE) core (white) and bound 3′UTR RNA (red). After second strand nicking, the nicked single-stranded second strand DNA is displaced towards the RT core and the double-strand DNA bend angle changes near the ZnF1 and Myb domains. (e) Nicked ends of upstream target site DNA are illustrated with nearby R2Pm protein regions NTE −1 and ZnF3–2.

References

    1. Han J. S., Non-long terminal repeat (non-LTR) retrotransposons: mechanisms, recent developments, and unanswered questions. Mob DNA 1, 15 (2010). - PMC - PubMed
    1. Payer L. M., Burns K. H., Transposable elements in human genetic disease. Nat Rev Genet 20, 760–772 (2019). - PubMed
    1. Mita P., Boeke J. D., How retrotransposons shape genome regulation. Curr Opin Genet Dev 37, 90–100 (2016). - PMC - PubMed
    1. Flasch D. A., Macia Á., Sánchez L., Ljungman M., Heras S. R., García-Pérez J. L., Wilson T. E., Moran J. V., Genome-wide de novo L1 Retrotransposition Connects Endonuclease Activity with Replication. Cell 177, 837–851.e28 (2019). - PMC - PubMed
    1. Thawani A., Ariza A. J. F., Nogales E., Collins K., Template and target-site recognition by human LINE-1 in retrotransposition. Nature 626, 186–193 (2024). - PMC - PubMed

Publication types

LinkOut - more resources