Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb;626(7997):194-206.
doi: 10.1038/s41586-023-06947-z. Epub 2023 Dec 14.

Structures, functions and adaptations of the human LINE-1 ORF2 protein

Affiliations

Structures, functions and adaptations of the human LINE-1 ORF2 protein

Eric T Baldwin et al. Nature. 2024 Feb.

Abstract

The LINE-1 (L1) retrotransposon is an ancient genetic parasite that has written around one-third of the human genome through a 'copy and paste' mechanism catalysed by its multifunctional enzyme, open reading frame 2 protein (ORF2p)1. ORF2p reverse transcriptase (RT) and endonuclease activities have been implicated in the pathophysiology of cancer2,3, autoimmunity4,5 and ageing6,7, making ORF2p a potential therapeutic target. However, a lack of structural and mechanistic knowledge has hampered efforts to rationally exploit it. We report structures of the human ORF2p 'core' (residues 238-1061, including the RT domain) by X-ray crystallography and cryo-electron microscopy in several conformational states. Our analyses identified two previously undescribed folded domains, extensive contacts to RNA templates and associated adaptations that contribute to unique aspects of the L1 replication cycle. Computed integrative structural models of full-length ORF2p show a dynamic closed-ring conformation that appears to open during retrotransposition. We characterize ORF2p RT inhibition and reveal its underlying structural basis. Imaging and biochemistry show that non-canonical cytosolic ORF2p RT activity can produce RNA:DNA hybrids, activating innate immune signalling through cGAS/STING and resulting in interferon production6-8. In contrast to retroviral RTs, L1 RT is efficiently primed by short RNAs and hairpins, which probably explains cytosolic priming. Other biochemical activities including processivity, DNA-directed polymerization, non-templated base addition and template switching together allow us to propose a revised L1 insertion model. Finally, our evolutionary analysis demonstrates structural conservation between ORF2p and other RNA- and DNA-dependent polymerases. We therefore provide key mechanistic insights into L1 polymerization and insertion, shed light on the evolutionary history of L1 and enable rational drug development targeting L1.

PubMed Disclaimer

Conflict of interest statement

M.S.T., B.D.G., M.G., K.H.B. and E.A. hold equity in and have received consulting fees from ROME Therapeutics. J.L. holds equity in ROME Therapeutics. D.H. and E.T. have received consulting fees from ROME Therapeutics. Research conducted at Proteros Biostructures and Charles River Laboratory was contracted by ROME Therapeutics. Research for this project in the Götte laboratory was sponsored by ROME Therapeutics. M.S.T. has received consulting fees from Tessera Therapeutics. K.H.B. declares relationships with Alamar Biosciences, Genscript, Oncolinea/PrimeFour Therapeutics, Scaffold Therapeutics, Tessera Therapeutics and Transposon Therapeutics.

Figures

Fig. 1
Fig. 1. Pathogenic replication cycle of L1 and the 2.1 Å resolution crystal structure of human ORF2p core in a ternary complex.
a, The 6 kb human L1 element contains an internal 5′ untranslated region (UTR) promoter, two proteins ORF1p and ORF2p in a bicistronic arrangement separated by 63 nt and a short 3′ UTR. b, Replication cycle of L1, a streamlined self-copying DNA parasite. Derepression of genomic L1s results in Pol II transcription and export of the L1 RNA, which is translated to form an RNP complex containing one copy ORF2p, a multifunctional enzyme, and many copies of ORF1p, a homotrimeric chaperone involved in nuclear entry that can form phase-separated granules. Canonically, in the nucleus, ORF2p integrates a new copy of the L1 RNA into the genome in a mechanism termed TPRT, in which cleavage by the L1 EN liberates a genomic DNA (gDNA) 3′-OH used to prime reverse transcription of the L1 RNA, followed by insertion by poorly understood mechanisms (‘Discussion’, Fig. 6). Non-canonical outcomes contribute to pathology: failed insertions and aberrant EN activity result in DNA damage and translocations, and aberrant cytosolic RT activity generates inflammatory RNA:DNA hybrids. Host proteins (not shown) are associated at every step and may repress L1 or function as essential cofactors. c, Sodium dodecyl sulfate polyacrylamide gel electrophoresis analysis of pure, monodisperse 97 kDa ORF2p core after size exclusion chromatography. d, Two new domains (tower and wrist) and three canonical RT subdomains (fingers, palm, thumb) coordinate with a hybrid duplex RNA template (purple) and DNA primer (cyan) and incoming dTTP nucleotide (yellow) for ORF2p core RT activity in the 2.1 Å resolution crystal structure in a ‘right-hand’ RT fold that is uniquely adapted. All five ORF2p core domains contact the template or primer, and numerous residues contact the incoming base; protein contacts are summarized in the inset schematic.
Fig. 2
Fig. 2. Cryo-EM structures of ORF2p core in apo, ssRNA and RNA:DNA hybrid-bound states.
a, ORF2p is unstable in the absence of nucleic acids (Tm = 34.1 °C ± 0.35) but is significantly stabilized by the binding of ssRNA (Tm = 47.5 °C ± 0.32) and RNA:DNA heteroduplex (Tm = 50.2 °C ± 0.1) as determined by differential scanning fluorimetry. b, Density map of the 3.3 Å cryo-EM reconstruction of the ORF2p core in ternary complex with RNA template–DNA primer heteroduplex and dTTP, coloured by proximity to modelled domains with fit atomic model (inset left), which shows clear density for primer, template and dTTP base for addition. Deviation of RNA template (inset right) in the ssRNA cryo-EM structure (purple) from the heteroduplex (grey, backbone RMSD of 3.76 Å). c, Structural schematic of the contacts between the PIP box (inset left) and baseplate (inset right) subdomains of the ORF2p tower with the canonical RT subdomains of palm and fingers. d, Denaturing gel RT assay with ORF2p core (wild type; WT) or tower deletions (∆302–363, ∆302–389) shows similar RT activity with and without the tower and tower lock. Data are representative of three (a) and two (d) independent experiments.
Fig. 3
Fig. 3. L1 biochemical activities, priming and cytoplasmic reverse transcription of L1.
a, Denaturing gel ORF2p RT assay. ORF2p core was an efficient DNA polymerase on all template–primer combinations; RNA priming on an RNA template was reduced but remained significant, with time-dependent full template-length (FTL) reaction products. NTA (+) and template jumping/switching (##) larger products were clearer on longer exposure (Extended Data Figs. 3–5 and Supplementary Figs. 3 and 4). b, ORF2p core (33 nM) single dATP incorporation kinetics with RNA or DNA template and 20 nt DNA primer. c, Extension of very short (5–10 nt) primers, pre-annealed to DNA or RNA templates, by ORF2p and HIV-1 RT; n = 4 (DNA), n = 3 (RNA) independent samples over two experiments. d, ORF2p RT assay showing efficient elongation of an RNA hairpin to FTL; HIV-1 RT showed minimal elongation. e, ORF2p efficiently extended a uridylated Alu-derived RNA hairpin. Ribonucleoside triphosphate incorporation was strongly selected against. f, Immunofluorescence of HeLa cells transfected for 24 h with WT or mutant L1 constructs (ORFeus-Hs) stained for RNA:DNA hybrids with catalytically inactive RNase H1 (dRNH1) and ORF2p (Flag). Cytosolic RNA:DNA hybrids colocalized with ORF2p, depended on RT activity, were ablated by 50 µM d4T and did not depend on EN activity, ruling out a nuclear origin. Hybrids were most prominent in L1 granules but were still present when ORF1p was removed (ORF2 only, monocistronic). g, Top left, ORF1p induction by 1 µM decitabine in THP1 monocytes. Concomitantly, interferon (IFN) production increased (secreted luciferase reporter, top right; lum., luminescence), further augmented by knockout of TREX1, a nuclease that degrades L1 cDNA. Bottom: treatment of these cells with 10 µM cGAS inhibitor G140 or 50 µM d4T RTI reduced baseline and decitabine-induced IFN production; 10 µM POC d4T, a more efficiently triphosphorylated d4T prodrug, reduced IFN further. For IFN, n = 4 biologically independent samples over two experiments. Scale bars, 10 μm. All error bars indicate s.d.
Fig. 4
Fig. 4. Inhibition and structure of full-length ORF2p.
a, The ORF2p core was inhibited by NRTIs but not allosteric NNRTI HIV inhibitors in vitro according to homogeneous time-resolved fluorescence assay (n = 3 wells). b, 3TC inhibition in gel-based RT assay of full-length ORF2p WT (FADD) or HIV-like (FMDD). Although both were efficient RTs, 3TC more potently inhibited HIV-like FMDD than WT ORF2p. c, Structural basis for poor L1 inhibition by AZT. Crystal structure of AZT triphosphate bound to HIV-1 RT (PDB 5I42) versus model of AZT triphosphate bound to L1 ORF2p. A clash between the 3′-azido and ORF2p F605 backbone NH is highlighted. Dashed lines indicate salt bridges rigidifying the ORF2p pocket. d, Comparison of the HIV-1 RT NNRTI-binding region with ORF2p. Left, HIV-1 RT in the NNRTI-unbound conformation (PDB 7LRI). Residues involved in NNRTI-resistance are highlighted; space occupied by HIV-1-bound nevirapine is shadowed (PDB 4PUO). Right, equivalent region in L1 ORF2p. The long α-helix corresponds to residues 572–588 in ORF2p. Residues analogous to those in HIV-1 RT are labelled. e, Quantification of single-nucleotide incorporation RT assay showing that purified ORF2p core and full-length ORF2p are similarly active in incorporation of dC or 3TC nucleotides. f,g, Integrative modelling of the full-length ORF2p using Integrative Modeling Platform software, combining data from AlphaFold, molecular dynamics simulations, cryo-EM and cross-linking mass spectrometry generated an ensemble of conformational states. f, Negative stain transmission electron microscopy validation: class averages were postprocessed and matched to projection images of ORF2p models. g, Localization densities represent the structural flexibility of EN, tower, wrist and CTD domains in the ensemble of full-length ORF2p models. Representative full-length ORF2p models from the validated ensemble highlight concerted movements of EN, tower and CTD relative to fingers, palm and thumb, together allowing ORF2p to adopt open and closed states. Data in a, b and e are representative of two independent experiments and shown as mean ± s.d.
Fig. 5
Fig. 5. Structural evolutionary analysis of ORF2p.
a, Structural Shannon entropy (‘structural entropy’) in ORF2p, measured from 57 L1 sequences from diverse vertebrates and plants and smoothed by averaging a 130-residue (approximately 10% of protein length) sliding window was lowest in the ancestral palm domain and highest in the C-terminal domain. b, Structural entropy correlates strongly with retrotransposition (retroT, ****P < 10−15, two-tailed t-test), comparing with retroT measurements from 417 consecutive scanning trialanine mutants of ORF2p. c, Mapping retroT and structural entropy onto the structure of ORF2p highlighted the overall concordance, as well as a notable discordance in the helix clamp around residue Y823 (inset). d, Structural perplexity, an information-theoretic measurement of the structural distance between two proteins, relative to ORF2p RT of a curated set of 50 proteins calculated using Plexy (Supplementary Methods). e, Normalized structural perplexity between full-length ORF2p and all proteins in the curated set, represented using multidimensional scaling such that the relative pairwise Euclidean distances were preserved (Supplementary Methods). For RT and RT-like proteins, the polypeptide with polymerase activity is used; for other proteins, the entire biological assembly is used. Dashed red lines represent the first and second standard deviations of the two-dimensional distance from full-length ORF2p. 2D, two-dimensional.
Fig. 6
Fig. 6. Revised L1 insertion model.
a, ORF2p bound to target DNA as TPRT begins, drawn schematically with linear target DNA for clarity as in the models below. b, ORF2p in complex during first strand synthesis. It seems more likely that ORF2p bends the target DNA around the highly positively charged ‘back’ face of the polymerase (Extended Data Fig. 9); it can then pass through the PCNA ring clamp, which binds to the PIP box and recruits RNase H2 (ref. ). c, Revised insertion model. Activities supporting steps 4, 5, 7 and 8 are demonstrated here. 1. ORF2p EN cuts target DNA, liberating a gDNA 3′-OH 2. TPRT: the T-rich gDNA primer is passed into the RT active site, where it base pairs with the poly(A) tail of the bound template, and the 3′-OH is extended. 3. First strand synthesis generates a large (6 kb) cDNA loop; RNase H2, recruited by ORF2p–PCNA, can begin. 4. NTA, in which extra bases are added to the 3′ cDNA end beyond the 5′ end of the RNA template, may occur. 5. Template jumping or switching to the exposed single-stranded gDNA may follow, potentially facilitated by microhomology from NTA nucleotides and the 5′ cap. This would also release 5′ phosphate-bound EN to ‘rock and roll’,, to carry out: 6. The second EN (staggered) cut, which liberates the 3′ OH used to prime second strand synthesis; a stagger from the first cut of approximately 12–18 bp results in characteristic target site duplications (TSDs),,,. 7. Strand transfer and priming of second strand synthesis. 8. Second strand synthesis using the 6 kb L1 cDNA as template. RNase H2 activity may also occur here. 9. Ligation and end repair, resulting in a completed approximately 6 kb insertion flanked by TSDs. The second EN cleavage may sometimes occur in the absence of a template jump. b, © 2023 JHUAAM. Illustration: Jennifer E. Fairman.
Extended Data Fig. 1
Extended Data Fig. 1. Purification and crystal structure of ORF2p core.
a, Size exclusion chromatography (SEC) of recombinant ORF2p core (left, Superdex 200 increase 10/300 GL column, Cytiva) shows a homogenous and Gaussian peak corresponding to the expected retention time of a ~ 100 kDa monomer. SDS-PAGE analysis of peak fractions (right) shows the ORF2p core peak is >99% pure with contaminants and uncleaved MBP-ORF2p core removed in the void volume; a trace amount of uncleaved MBP-ORF2p remains in the preparation. b, In an ELISA-based reverse transcriptase assay (Roche), ORF2p core shows increased activity after SEC relative to heparin chromatography alone against an oligo(A) template. c, Comparison of ORF2p core crystal structure with AlphaFold model used for molecular replacement shows remarkable similarity, with a final root-mean-square deviation (RMSD) of 0.946 Å from the search model. ORF2p core comprises 46 secondary structural elements divided between 10 beta strands and 36 helices and is resolved from residues 251–1061 with gaps from 304–388, 799–803, 851–871, 905–912, and 923–927. d, 2Fo-Fc electron density map of the ORF2p core crystal with built model at a threshold of 2σ shows clear side chain density for important residues near the active site. The highlighted “gatekeeper” residue F605 sterically selects against ribonucleotides by clashing with the 2’-OH, providing a rationale for ORF2p’s low RNA synthesis activity. e, Detailed view of key contacts between the primer and template and residues of the fingers (K541, K545, Q552), palm (F566, I567, P568, G569, M570, Q571, G660, P665) and wrist (Y878, K1047,G1048, I1050, S1051).
Extended Data Fig. 2
Extended Data Fig. 2. Comparison of cryo-EM maps and models.
a, Final cryo-EM maps of ORF2p apo (left), bound to ssRNA (middle) or template: primer hybrid (right) colored by corresponding ORF2p region. There is an expected clear lack of density in the active site for the apo ORF2p map and in the primer-binding region for the ssRNA map. Consistent with apo ORF2p being unstable in vitro, it represents the lowest resolution reconstruction, and no corresponding atomic model was built, but rigid body fitting of the hybrid-bound atomic model fills the density. b, Coloring of the refined ORF2p structural model by RMSD from the ORF2p crystal structure reveals little difference in the thumb-fingers-palm-thumb subdomains (RMSD = 1.01 Å) but significant deviation of the wrist (RMSD = 4.01 Å). Superposition of the crystal and cryo-EM derived structures (inset) shows a rotational motion of 4 Å and an upwards translation of 7.5 Å occurs in the distal wrist; the palm-adjacent wrist helices are completely superposed, and both structures maintain the same template contacts. c, Comparison of the structure of apo HIV-1 RT (PDB: 1dlo) and rigid body fit apo ORF2p core. In apo HIV-1 RT, the enzyme is in an inactive conformation with the thumb occupying the active site or” thumb down”; alternatively, apo ORF2p closely resembles the active form of the enzyme with the “thumb up”. This “thumb up” form would not require a conformational change for accepting incoming template, like in HIV-1 RT. d, Orphan density from the ORF2p core-ssRNA map low-pass filtered to 4.5 Å. This density is consistent with the predicted location of the tower lock from AlphaFold (inset, top) and molecular dynamics simulations of the full tower domain cluster the tower lock near this density (bottom). This location is also consistent with the position of the R2Bm tower lock portion of the tower-like domain, which binds to the 3’UTR RNA (PDB 8gh6).
Extended Data Fig. 3
Extended Data Fig. 3. Design and characterization of the ORF2p tower domain deletions reveal it is not required for RT.
To test the role of the tower and tower lock in reverse transcription, ORF2p core constructs where the tower was deleted at two different points were designed. a, The maximal tower deletion construct (∆302–389) represents removal of the unresolved tower and the tower lock residues from both the EM and crystal structures as evidenced by mapping the deletion back onto the EM structure of ORF2. The shorter tower deletion ∆302–363 deletes the tower but preserves the lock. b, SDS-PAGE analysis of monodisperse ORF2p core tower deletion constructs show relatively pure enzyme ( > 90%). c, Comparison of wildtype ORF2p core and full length versus ∆302–363 and ∆302–389 (cropped in Fig. 2d) shows similar RT activity between all constructs with little difference in efficiency of formation of 41 nt full length products over time; full length and ∆302–389 are slightly less specifically active, which may be due to batch effects, contaminants, or concentration estimation errors; 17 nM of purified ORF2p core constructs was reacted with 0.1 μM dNTP mixture and samples taken over time. Asterisk (*) 32P-labeled 5’-end of the primer. d, Comparison of wildtype and tower deletion ORF2p core constructs under longer reaction conditions with higher concentrations of enzyme and nucleotide shows all constructs form full length, NTA ( + and above), and template jumping/switching products (##), although the yield of these larger products correlates with specific activity; it appears that here and in panel (c), deletion of both the tower and lock (∆302–389) may selectively negatively impact template jumping/switching activity, although this may be attributable to either lock or the way in which it was deleted, and further investigation is warranted. These reactions are 1 h with 3-fold more enzyme (50 nM) and 10-fold more dNTPs (1 μM). Scanned gel images are cropped and corrected for distortion artifacts with contrast uniformly increased to facilitate the visualization of minor products.
Extended Data Fig. 4
Extended Data Fig. 4. Priming requirements and mismatch tolerance of ORF2p core.
a, Comparison of 10 nt vs 20 nt DNA primers reveals little difference in efficiency of formation of products, including larger template jumping/switching products (##). b, ORF2p performs DNA synthesis with 5–10 nt DNA primers, although 5 nt and, to a lesser extent 6 nt, are slightly less efficiently used. As seen consistently above, RNA templates are slightly less efficient than DNA. Higher concentrations of ORF2p core result in higher activity in all conditions and more template jumping/switching products. Scanned gel images are cropped and corrected for distortion artifacts with contrast uniformly increased to facilitate the visualization of minor products. (* indicates Cy5 label, all panels). c, RNA synthesis is strongly selected against, as indicated by nucleotide (dNTP or NTP) incorporation activity of LINE-1 RT on DNA or RNA using a DNA primer. Denaturing PAGE migration pattern of the reaction products generated after 5 min of dNTP or NTP incorporation along DNA and RNA templates using 20-nt primers. d, Priming activity of ORF2p and HIV-1 with one or two terminal mismatches; two enzyme preps of HIV-1 RT are compared to ORF2p, and additional unextended substrates are shown. L1 tolerates all terminal mismatches against an A template to some extent, as well as some penultimate mismatches; A:G is inefficient. In contrast, HIV-1 is less tolerant of A:A and A:G terminal and U:A and U:G penultimate mismatches; n = 1 (LINE-1) and n = 2 (HIV-1) points quantified from 2 independent experiments.
Extended Data Fig. 5
Extended Data Fig. 5. Comparative enzymology of ORF2p RT with HIV-1 and HERV-K.
a, Single nucleotide incorporation kinetic curves and parameters of dATP with 36-nt RNA or DNA template and 20-nt DNA primer with ORF2p core (33 nM), HIV-1 RT (4 nM) and HERV-K RT (12 nM). For each enzyme, Michaelis-Menten parameter kcat/KM is nearly identical on both templates (n = 3 (DNA template) and n = 4 (RNA template) independent samples over 2 independent experiments; data represented as mean ± SD). b, Comparison of HIV-1 RT and ORF2p in extension of very short (5–10 nt) primers, pre-annealed to DNA and RNA templates. ORF2p extends all DNA and RNA primer lengths, with somewhat reduced efficiency at 5 nt and, to a lesser extent, 6 nt. In contrast, HIV-1 does not extend the same DNA:DNA template:primer mixes of these lengths and does not extend 5 nt and has reduced activity with 6 nt DNA primers on RNA templates. ORF2p also makes NTA (+) and template jumping/switching (##) larger products; more visible on longer exposure. Notably, neither of these larger products are detectable with HIV-1 RT; quantification represents n = 3 (DNA) and n = 4 (RNA) samples from two independent experiments. c, Heparin trap processivity assay for ORF2p vs HIV-1 RT; heparin sulfate is a negatively charged sugar polymer that competes for nucleic acid binding sites. The indicated RNA or DNA primers and templates were pre-annealed and reactions were prepared and preincubated as indicated, then initiated with Mg2+ as a control, with heparin and Mg2+ together, or with a two-step “Trap control” procedure in which heparin and Mg2+ are added sequentially. Reactions are quenched after 5 seconds. At this very short time point, ORF2p produces full template length product (FTL, 3–9% of total signal in the lane in all conditions) and is unaffected by the heparin trap; in contrast, HIV-1 RT produces 0–3% FTL product without trap and no detectable FTL product with trap. When all products are quantified, HIV-1 extends 21–37% of primers, and this is roughly halved by the heparin trap; ORF2p extends ~10–18% of primers and is unaffected by the trap. In the trap control (TC) RNA template:DNA primer lanes, HIV- 1 performs a small amount of residual RT, consistent with a distributive pattern of synthesis, whereas ORF2p is inhibited, bound to the heparin trap. These are all consistent with high processivity for ORF2p and low- processivity distributive pattern synthesis for HIV-1 RT. Asterisk (*) Cy5-5’-label on primer. n = 1 quantified samples shown representative of two independent experiments.
Extended Data Fig. 6
Extended Data Fig. 6. Cytoplasmic RT activity of ORF2p and activation of interferon.
a-c Indirect immunofluorescence of cells transiently transfected with plasmids expressing the indicated L1 constructs and stained for RNA-DNA hybrids and ORF2p or ORF1p using two different hybrid detection reagents demonstrates cytosolic synthesis. Constructs all include C-terminal 3C-3xFlag tag on ORF2p and are synthetic ORFeus-Hs sequence except where L1RP is indicated (L1 retinitis pigmentosa locus, AF148856, pLD564). Cells were fixed in methanol and stained 24 h post transfection with the indicated constructs. Images are representative from 4 independent experiments. D4T RTI treatment is 50 µM, added at the time of transfection. RT- is L1 with D702Y mutant ORF2, EN- is L1 with double E43S + D145N mutant ORF2. a, ORF1p co-stain in HeLa cells with dRNH1 (catalytically inactive human Rnase H1 fused to GFP). b, ORF2p (Flag) with dRNH1 co-stain in HeLa and U2-OS cells. c, ORF2p (Flag) with S9.6 co-stain in HeLa and U2-OS cells. d, Inhibition of interferon signaling in THP1 cells with cGAS inhibitor G140, with and without decitabine treatment; raw luciferase data are shown, n = 4 biologically independent samples from two independent experiments; all points shown. IC50s for G140 are 0.23-0.30 µM. e, Relative interferon production from titrations of d4T vs POC d4T prodrug [d4T bis(isopropoxycarbonyloxymethyl)phosphate] in TREX1 knockout THP1 cells treated for 5 days with 1 µM decitabine plus the indicated concentration of drug; normalized luciferase data from n = 4 biologically independent samples representative of two independent experiments; error bars are mean ± SD.
Extended Data Fig. 7
Extended Data Fig. 7. Inhibition of ORF2p core by NRTI and NNRTI reverse transcriptase inhibitors.
a, NRTIs are inhibitors of the RT activity of ORF2p core. Denaturing PAGE migration pattern (left) of RT reactions inhibited by NRTIs and their quantification (right) indicate NRTIs are potent (low µM IC50) inhibitors of ORF2p core. ORF2p core was preincubated with a template:primer containing a single site for the incorporation of a given nucleotide analogue. The primer/template sequence shown in the panel a illustrates the case of single incorporation of 3TC, with a single G for incorporation; the incoming template sequence for entecavir and carbovir has a single C, and for d4T a single A, each at position labeled “N”. Reactions were incubated for one minute at 37 °C with a 100 nM dNTP mixture and increasing concentrations of listed inhibitors. IC50 index, fold = IC50 (drug, µM) ÷ [dNTP] (natural counterpart, µM, here 0.1 µM) and reflects the fold-excess of a required NRTI over its natural counterpart to give a 50% inhibition in DNA synthesis. b, Schematic of homogenous time-resolved FRET RT (HTRF) assay. Fluorescein-labeled dNTPs (here, uracil-TP) are incorporated by ORF2p into a biotinylated primer, here shown against a poly(A) template. Detection is then achieved using FRET with a terbium cryptate labeled streptavidin, and the time-resolved technique and time-delayed emission from terbium cryptate reduces background from other fluorescent chemicals in the mix. In the presence of ORF2p RT inhibitors (RTIs), base incorporation is stopped and FRET signal is lost. For NNRTIs the indicated poly(A)-oligo(dT) template:primer is used; for NRTIs, a template:primer pair of RNA36:biotin-DNA25 is used. c, Quantification of HTRF screen shows HIV NNRTIs do not inhibit ORF2p, even at concentrations up to 1 mM for nevirapine. Upon binding of NNRTIs, such as nevirapine, the primer grip and the 94–102 segment shift which, together with movement of Y181 and Y188, open the NNRTI pocket. Accordingly, mutations of the 94–102 segment, Y181 and Y188 have all been implicated in resistance to multiple NNRTIs; n = 3 independent wells representative of two independent experiments. d, Inhibition assay in HeLa cells stably expressing a dual luciferase L1 retrotransposition reporter, normalized to cell viability using Cell Titer Glow reagent; n = 3 biologically independent wells representative of two experiments. e, 3TC analog in the context of the native FADD active site loop (left) and 3TC analog in the context of the mutant FMDD active site loop (right). Note the lack of van der Waals contacts between the Ala 701 side chain and the oxathiolane ring, including the sulfur atom, in the nucleotide in the native active site, contrasted with the favorable contact between the Met 701 and the oxathiolane ring in the nucleotide. Similar effects were shown previously in the YADD mutant of HIV (WT YMDD). f, Full length ORF2p and ORF2p core are compared in single nucleotide incorporation and inhibition experiments with the indicated nucleoside triphosphates and 3TC triphosphate; ‘dNTPx4’ is a mix of all four standard dNTPs. Full length ORF2p (purity insufficient to accurately determine concentration) produces similar reaction products and shows similar activity and inhibition to both partially-purified (Heparin) and fully-purified (after SEC) ORF2p core. This assay qualitatively reveals both incorporation and tolerance for some misincorporations of the polymerase. For example, in the ‘dC’ lane, containing only dCTP, the 26 nt band represents one C-G incorporation, and the 28 nt band is from subsequent C-A misincorporation followed by a C-G incorporation. Adding 3TC chain-terminates products and the strong bands at at 26, 28, and 35 nt highlight the G-base positions in the template.
Extended Data Fig. 8
Extended Data Fig. 8. Comparison of ORF2p with other RTs.
a, Domain organization and sequence alignment of LINE-1 ORF2 (L1RP locus, GenBank AF148856) with other reverse transcriptase (RT) containing proteins: Bombyx mori R2Bm RT (PDB 8gh6, GenBank AAB59214), group IIC intron (PDB 6ar1, Uniprot E2GM63), non-LTR element HERV-K (PDB 7sr6, clone 10.9, GenBank AF080231), and retrovirus HIV-1 RT (PDB 4pqu, UniProt: P03366.3). Sequences were aligned structurally using ChimeraX software and via the conserved RT sequence blocks (0–7),, with degree of sequence conservation and common structural features noted below. b, Comparison of the N-terminal extension and 5’ template contacts between (from left to right) LINE-1 ORF2p, HERV-K RT (PDB 7sr6), group IIC intron (PDB 6ar1) and R2Bm (PDB 8gh6). The ORF2p PIP box helix occupies the space of the HIV αA helix and a tower-like helix in R2Bm that is not a PIP box. The template makes extensive contacts with ORF2p and takes a distinct 5’ path upstream of the active site than in the other RTs, guided by adaptations in fingers (L535), palm (I642), and tower (Q338). c, Comparison of downstream primer-binding surfaces across the four RTs; primer contacts with thumb helix clamps (lime green) shown inset. The thumb in ORF2, R2Bm, and GSI-IIC is permuted relative to HERV-K (and HIV), with the primer-contacting helix clamp on helix #2, whereas it is on helix #1 in HERV-K. ORF2p wrist also contacts the template, the R2 linker makes a smaller set of contacts, and these bases are exposed in HERV-K and GSI-IIC. d, Models of ORF2p Core, HERV-K RT, and GSI-IIC RT aligned by palm superposition. The RT domains of GSI-IIC and ORF2p Core are more similar to each other than to HERV-K. The HERV-K linker and RNase H domains occupy a similar position to the ORF2p wrist, and GSI-IIC D domain is in a similar position to ORF2p CTD and R2 CCHC (see Supplementary Fig. 11).
Extended Data Fig. 9
Extended Data Fig. 9. ORF2p and R2Bm structures show opposing topologies of target DNA relative to the active site.
a, Comparison of ORF2p and R2BM structures, oriented identically following palm superposition; the closed state (Class 15) ORF2p model is shown. In both structures, the active site is in back center (incoming dTTP is visible) and generated product would be ratcheted out of the enzyme by sequential base additions, pulling template RNA through as the product emerges towards the viewer out of the plane of the printed page. In R2Bm, resolved initiating TPRT, the C-terminal restriction-like endonuclease (RLE) holds the 5’ phosphate from the upstream target DNA in the ‘top right’, as viewed here from the product face, and the adjacent CCHC zinc knuckle melts the target strand from second strand, allowing the upstream target DNA to wrap around the positively charged ‘back’ face. In contrast, ORF2p has an N-terminal APE-like endonuclease (EN, from PDB 7n8s; primer strand with 3’-OH is transparent), located on the opposite wall of the polymerase groove relative to the position of RLE in R2. However, the CTD CCHC remains in the ‘top right’, positioning the CCHC zinc knuckle nearly identically in both enzymes. To summarize, R2 has both RLE and CCHC together, on the ‘top right’ of the active site, whereas in ORF2p, EN and CTD are on opposite sides of the active site, with EN on the ‘top left’ and CCHC on the ‘top right’, and in this configuration, the target DNA would traverse across the two domains. Indeed, because the primer (bound to downstream DNA) must similarly be passed into the active site, the apparent result of this is that the target DNA is reversed in ORF2 with respect to R2: the downstream DNA would most likely similarly bind the CTD CCHC zinc knuckle and wrap around the highly positively charged ‘back’ face of the enzyme, similar to the behavior of the upstream DNA in R2. A cartoon of this is drawn in Fig. 6b. However, other orientations are possible, and these models were resolved without EN-bound DNA. b, Calculated Coulombic potential mapped onto the model surfaces (ChimeraX) shows extensive positively charged surfaces on both R2Bm and ORF2p. In R2Bm, resolved starting TPRT, target DNA and the structured RNA bind to most of the positively charged (blue) surface, which includes specific domains that recognize the unique sequences and structures of the target ribosomal DNA and 3’ untranslated region (UTR) of the R2 RNA. On ORF2p, the ‘back’ face of the enzyme is extensively positively charged, and these surfaces are highly likely to be involved in binding both target DNA and template RNA. These charged residues are largely required for retrotransposition and may coordinate a similar path of the target DNA in ORF2p as in R2. The DNA clamp ring PCNA binds to the ORF2p PIP box during integration on the ‘back’ face (gray helix, arrows), and it appears that PCNA could be loaded on the target DNA if it were to wrap this positively charged ‘back’ surface (Fig. 6b). R2 does not have a PIP box and PCNA has no known role in R2 mobile element insertion.

References

    1. Kazazian HH, Jr, Moran JV. Mobile DNA in health and disease. N. Engl. J. Med. 2017;377:361–370. doi: 10.1056/NEJMra1510092. - DOI - PMC - PubMed
    1. Rodriguez-Martin B, et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 2020;52:306–319. doi: 10.1038/s41588-019-0562-0. - DOI - PMC - PubMed
    1. Taylor, M. S. et al. Ultrasensitive detection of circulating LINE-1 ORF1p as a specific multi-cancer biomarker. Cancer Discov.13, 2532–2547 (2023). - PMC - PubMed
    1. Rice GI, et al. Reverse-transcriptase inhibitors in the Aicardi-Goutieres syndrome. N. Engl. J. Med. 2018;379:2275–2277. doi: 10.1056/NEJMc1810983. - DOI - PubMed
    1. Carter V, et al. High prevalence and disease correlation of autoantibodies against p40 encoded by long interspersed nuclear elements in systemic lupus erythematosus. Arthritis Rheumatol. 2020;72:89–99. doi: 10.1002/art.41054. - DOI - PMC - PubMed