Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 20;11(25):eadu5533.
doi: 10.1126/sciadv.adu5533. Epub 2025 Jun 20.

Structures of vertebrate R2 retrotransposon complexes during target-primed reverse transcription and after second-strand nicking

Affiliations

Structures of vertebrate R2 retrotransposon complexes during target-primed reverse transcription and after second-strand nicking

Akanksha Thawani et al. Sci Adv. .

Abstract

R2 retrotransposons are site-specific eukaryotic non-long terminal repeat retrotransposons that copy and paste into gene loci encoding ribosomal RNAs. Recently, we demonstrated that avian A-clade R2 proteins achieve efficient and precise insertion of transgenes into their native safe-harbor loci in human cells. The features of A-clade R2 proteins that support gene insertion are not well characterized. Here, we report high-resolution cryo-electron microscopy structures of two vertebrate A-clade R2 proteins at the initiation of target-primed reverse transcription and after cDNA synthesis and second-strand nicking. Using biochemical and cellular assays, we illuminate the basis for high selectivity of template use and unique roles for each of the three zinc-finger domains in nucleic acid recognition. Reverse transcriptase active site architecture is reinforced by an unanticipated insertion motif specific to vertebrate A-clade R2 proteins. Our work provides the first insights into A-clade R2 protein structure during gene insertion and may enable future improvement and adaptation of R2-based systems for precise transgene insertion.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Activities and cryo-EM structures of A-clade R2 RNPs initiating TPRT.
(A) Schematic of biochemical steps during DNA insertion. The second step marked with an asterisk (*) depicts the TPRT initiation state visualized in (F) and (G), and the final state depicts second-strand–nicked complex resolved in Fig. 5. (B) Phylogenetic analysis of R2 RT core (NTE −2 to CTI) from the A-clade (birds, turtle, and beetle) and D-clade (silk moth and fruit fly). (C) Denaturing PAGE with TPRT reaction products. 3′UTR RNA used were either a full-length 3′UTR RNA (Gf-full, Pm-full, and Bm-full) or their truncated versions (Gf-98, Gf-68, and Pm-112), each terminating in 5-nt homology to primer strand (R5). Orange triangles indicate expected TPRT product lengths for copying a single full-length template (TPRT cDNA). Multiple template RNAs may be copied in series (template jumping products). Regions of the same gel are imaged separately using different 5′ dyes. Loading control was detected with SYBR Gold. This is consistent for all TPRT gels hereafter. (D) PRINT assay schematic: mRNA encoding an R2p is transfected with an engineered template RNA comprising a 5′ module (5′M), modified CMV promoter (PRO), GFP ORF, polyadenylation signal (PA), and 3′ module (3′M) containing the 3′UTR or its truncation with terminal R4A22. Created with BioRender (45). (E) PRINT assays with PlaMe or TaGu mRNA and template RNA indicated. Data presented are mean values ± error bars indicating SD for three technical replicates. Note the log-scale y axis, which is consistent for PRINT assays. (F and G) Top: Colored domain schematics of PlaMe and TaGu with amino acid numbering (abbreviations given in the text). Bottom: Cryo-EM density maps of TPRT initiation complexes assembled with rDNA target site and either Gf-full (F) or Gf-98 RNA (G), colored by domains. (H and I) Ribbon diagrams of TPRT initiation complexes colored by domains.
Fig. 2.
Fig. 2.. Protein and DNA recognition of R2 3′UTR RNA.
(A) Schematic of direct interactions between PlaMe protein, rDNA target site, and 3′UTR RNA in the TPRT initiation complex (see fig. S6A for TaGu). Color scheme is consistent with Fig. 1. Solid black lines denote sequence-specific hydrogen bonds between protein residues and (deoxy)ribonucleobases, while dashed black lines represent hydrogen bonds between target site DNA and RNA bases. Solid gray lines denote pi-stacking contacts with (deoxy)ribonucleobases. Black circles represent base pairs. DNA numbering (green and gray strands) is negative upstream or positive downstream of the first-strand nick. RNA numbering (red strand) is from the start of Gf-full, as annotated in (A). (B) Secondary structure of the Gf 3′UTR RNA portions resolved in the TPRT initiation complexes for PlaMe and TaGu. Unresolved portions of the RNA are represented with dashed lines. (C) The 3′UTR RNA is engaged by the NTE −1 and ZnF3 domains (here, PlaMe; see fig. S6D for TaGu). ZnF3 domain from PlaMe contacts the pseudoknot of 3′UTR RNA. The cryo-EM map is shown in transparency. (D) Side chains in PlaMe ZnF3 make base-specific hydrogen bonds with G-236. PlaMe ZnF3 also makes a contact with the phosphate backbone of base G-237 at the junction of hinge and pseudoknot. The transparent densities correspond to the cryo-EM map around critical residues. Here and in subsequent figure panels, heteroatom representation has oxygen in red and nitrogen in blue. (E) PRINT assays using mRNA encoding TaGu and template RNA with 3′ module Gf-98 or a variant Gf-98 with R4A22 3′ tail. Base substitutions are numbered according to their position in Gf-full, with specific mutations described in the main text.
Fig. 3.
Fig. 3.. Protein recognition of the target site DNA and N-terminal R2p domain requirements.
Top left: PlaMe TPRT initiation complex is represented with boxes marking regions highlighted in (A) to (C); relative rotational angles are indicated. Transparent density is the atomic surface. (A) RLE and ZnK domains surrounding the nicked first strand and single-stranded second strand for PlaMe. (B) RT motif 6a loop is shown protruding into the target site DNA. (C) Configuration on target site DNA of the N-terminal DNA binding domains: The three ZnFs and the Myb domain for A-clade PlaMe (left) and TaGu (center) are compared with the single ZnF and Myb in D-clade BoMo (right). Transparent density is the atomic surface. (D) Base-reading hydrogen bonds between ZnF2 and target site DNA proximal to the nick site. Transparent density depicts the cryo-EM map for critical protein residues and nucleotides involved in interactions. (E) Top: PlaMe Spacer and its interaction with the RT domain and NTE 0 motif. Bottom: Unresolved TaGu Spacer denoted by dashed lines. Atomic models are presented alongside the transparent unsharpened cryo-EM densities. (F) Denaturing PAGE of TPRT reaction products with wild-type TaGu, ZoAl, PlaMe, and chimeric proteins (~20 nM each): PlaMe with the N terminus (Spacer, Myb, and three ZnFs) from TaGu (NTg) or ZoAl (NZa), PlaMe with ZnF3-2 domains from TaGu (ZFTg), and PlaMe with Spacer from TaGu (spacTg). Gf-68 RNA with 5-nt primer homology (400 nM) and target site DNA (5 nM) were used for all reactions. (G) PRINT assays using mRNA encoding PlaMe or the chimeras described in (F). (H) PRINT assays using mRNA encoding TaGu or TaGu chimeras: TaGu with the N terminus (Spacer, Myb, and three ZnFs) from PlaMe (NPm), or TaGu with ZnF3-2 domains from PlaMe (ZFPm). For both (G) and (H), the template RNA 3′ module was Gf-full followed by R4A22.
Fig. 4.
Fig. 4.. A C-terminal insertion in A-clade R2p.
(A) The CTI is rendered in orange against the RT, Linker, and Thumb domains and RNA:cDNA duplex. The shorter CTI loop present in BoMo is shown for comparison (22, 23). TaGu CTI protruding away from the RT domain and the central RNA:DNA duplex could only be traced using our unsharpened cryo-EM density map at low threshold. (B) Side chains of the conserved EWE motif that anchors the CTI to the RT domain and NTE 0 motif are displayed for PlaMe (top) and TaGu (bottom), where each dashed line represents a hydrogen bond. The transparent density corresponds to the cryo-EM map around those residues. (C) Denaturing PAGE of TPRT reaction products with proteins indicated and Gf-68 RNA with or without primer-complementary 3′ tail (4 or 0 nt). Orange triangles indicate expected TPRT product lengths for copying a single full-length template (TPRT cDNA). Multiple templates may also be copied in series (template jumping products). (D) Quantification of first strand nicking products in (C) when proteins are incubated with RNA lacking homology to target site DNA. Each replicate data point is shown, and bars are graphed as means ± 1 SD. (E) PRINT assays used the indicated R2p mRNA and template RNA with Gf-full followed by R4A22. Mean GFP % is indicated above each bar. WT, wild type.
Fig. 5.
Fig. 5.. Biochemical activity and cryo-EM structure of A-clade R2p related to second-strand nicking.
(A) Denaturing PAGE of target site DNA nicking and TPRT reaction products from assays using wild-type TaGu or its RT-dead and EN-dead variants. Gf-68 RNA with R5 was used as template RNA. Small triangle (mustard) indicates TPRT cDNA. (B) Top: Nucleic acid substrate design to capture a post-TPRT structure, accomplished using PlaMe. Bottom: Cryo-EM density (left) and ribbon diagram (right) of assembled PlaMe second-strand–nicked complex, colored by domains. (C) Comparison of upstream target site DNA position in the PlaMe TPRT initiation complex versus second-strand–nicked complex. The atomic surface of the PlaMe RT core is displayed in white and the 3′UTR RNA in red hue. (D) Nicked ends of upstream target site DNA are illustrated with nearby PlaMe protein regions NTE −1 and ZnF3-2.

Update of

Similar articles

Cited by

References

    1. Han J. S., Non-long terminal repeat (non-LTR) retrotransposons: Mechanisms, recent developments, and unanswered questions. Mob. DNA 1, 15 (2010). - PMC - PubMed
    1. Payer L. M., Burns K. H., Transposable elements in human genetic disease. Nat. Rev. Genet. 20, 760–772 (2019). - PubMed
    1. Mita P., Boeke J. D., How retrotransposons shape genome regulation. Curr. Opin. Genet. Dev. 37, 90–100 (2016). - PMC - PubMed
    1. International Human Genome Sequencing Consortium , Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). - PubMed
    1. Flasch D. A., Macia Á., Sánchez L., Ljungman M., Heras S. R., García-Pérez J. L., Wilson T. E., Moran J. V., Genome-wide de novo L1 retrotransposition connects endonuclease activity with replication. Cell 177, 837–851.e28 (2019). - PMC - PubMed

LinkOut - more resources