Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov;21(11):2084-2093.
doi: 10.1038/s41592-024-02418-z. Epub 2024 Sep 23.

Effective genome editing with an enhanced ISDra2 TnpB system and deep learning-predicted ωRNAs

Affiliations

Effective genome editing with an enhanced ISDra2 TnpB system and deep learning-predicted ωRNAs

Kim Fabiano Marquart et al. Nat Methods. 2024 Nov.

Abstract

Transposon (IS200/IS605)-encoded TnpB proteins are predecessors of class 2 type V CRISPR effectors and have emerged as one of the most compact genome editors identified thus far. Here, we optimized the design of Deinococcus radiodurans (ISDra2) TnpB for application in mammalian cells (TnpBmax), leading to an average 4.4-fold improvement in editing. In addition, we developed variants mutated at position K76 that recognize alternative target-adjacent motifs (TAMs), expanding the targeting range of ISDra2 TnpB. We further generated an extensive dataset on TnpBmax editing efficiencies at 10,211 target sites. This enabled us to delineate rules for on-target and off-target editing and to devise a deep learning model, termed TnpB editing efficiency predictor (TEEP; https://www.tnpb.app ), capable of predicting ISDra2 TnpB guiding RNA (ωRNA) activity with high performance (r > 0.8). Employing TEEP, we achieved editing efficiencies up to 75.3% in the murine liver and 65.9% in the murine brain after adeno-associated virus (AAV) vector delivery of TnpBmax. Overall, the set of tools presented in this study facilitates the application of TnpB as an ultracompact programmable endonuclease in research and therapeutics.

PubMed Disclaimer

Conflict of interest statement

Competing interests:

K.F.M. and G.S. are co-inventors on a patent application filed by the University of Zurich relating to the work described in this paper. G.S. is an advisor to Prime Medicine Inc. The remaining authors declare no competing interests.

Figures

Extended Data Figure 1
Extended Data Figure 1. Benchmarking of TnpB and Fanzor architectures in HEK293T cells.
(A) Schematic representation of experimental workflow and designs. NLS, nuclear localization sequence; BPNLS, bipartite NLS; SRAD, Serine-Arginine-Alanine-Aspartic acid; GS, Glycine-Serine; PuroR, Puromycin resistance; d, days; HTS, high-throughput sequencing; a codon-optimization and design from Xiang et al. (11) and Saito et al. (12) (B-D) Benchmarking of different architectures of ISDra2, ISAam1 and ISYmu1 TnpBs. Number of analyzed endogenous targets: ISDra2 TnpB, N = 7; ISAam1 TnpB, N = 7; ISYmu1 TnpB, N = 8. Each dot represents the mean of n = 3 independent biological replicates; the black bar represents the mean of all target sites tested for the respective design. Means were compared by two-tailed t-test. (E) Benchmarking of SpuFz1-v2 Fanzor embedded in various designs tested at one endogenous locus (B2M). Each bar represents the mean ± s.d. of n = 3 independent biological replicates and a two-tailed t-test was used to calculate variance. Indel frequencies are shown in Datafile S1.
Extended Data Figure 2
Extended Data Figure 2. High-throughput TAM determination assay (HT-TAMDA) of TnpBmax and variants thereof.
The log10 (rate constant) represents the mean of two replicates against two distinct target sequences.
Extended Data Figure 3
Extended Data Figure 3. Direct intracortical injection of scAAV-TnpB-Dnmt1.
(A) Schematic representation of stereotactic scAAV injection. (B-C) TnpBmax mediated editing at the Dnmt1 locus determined by deep amplicon sequencing in separated brain regions of mice treated with 5.0×1013 vg/kg scAAV. CTX, cortex; BS, brain stem; Hipp, hippocampus; Hypo, hypothalamus; MB, midbrain; OB, olfactory bulb; ST, striatum; TM, thalamus; CTRL, control. Each dot represents data from one animal; bar represents the mean ± s.d. of n = 3 animals.
Extended Data Figure 4
Extended Data Figure 4. Detailed protocol for ωRNA guide cloning.
Step 1: Digest and purify the ωRNA acceptor plasmid with BbsI. Step 2: Perform ligation or Golden-Gate-Assembly of phosphorylated and annealed oligonucleotides into the digested pωRNA-acceptor.
Figure 1
Figure 1. DNA cleavage and base editing in mammalian cells with enhanced TnpBmax.
(A) Schematic representation of the Deinococcus radiodurans ISDra2 locus, and TnpB engaged with target DNA. The transposon is flanked by the left-end (LE) and right-end (RE) elements and consists of the tnpA and tnpB genes. TnpB can be used as an RNA-guided DNA endonuclease by programming the ωRNA derived from the right end of the transposon to match a sequence 3' of the transposon adjacent motif (TAM). (B) Comparison of TnpB to CRISPR Class 2 RNA-guided endonucleases (RGENs). aa, amino acids; As, Acidibacillus sulfuroxidans; Cj, Campylobacter jejuni; Nme, Neisseria meningitidis; Sp, Staphylococcus aureus. (C) Genome editing in the human embryonic kidney cell line (HEK293T) over the course of 9 days with three different ωRNAs and TnpB. Indels, insertions, and deletions; d, day; dots represent the mean ± s.d of n ≥ 3 independent biological replicates. (D) Benchmarking of different TnpB architectures (ARC1-13) in HEK293T cells on a target-matched library with N = 94 individual ωRNA-target pairs. Each data point represents the mean of n = 4 independent biological replicates. Means were compared by two-tailed t-test. Bar represents the mean of N=94 independent target sites. NLS, nuclear localization sequence; BPNLS, bipartite NLS; SRAD, Serine-Arginine-Alanine-Aspartic acid; GS, Glycine-Serine. (E) Benchmarking (fold change) of ISDra2, ISAam1, and ISYmu1 TnpB designs on endogenous loci in HEK293T cells. Indel values (%) were normalized to the ARC-0 design of the respective TnpB. Each data point represents the mean of n = 3 independent biological replicates. Bar represents the mean ± s.d. of N = 7 (ISDra2 and ISAam1) or N = 8 (ISYmu1) target sites. acodon-optimization and design from Xiang et al. (11). (F) Comparison of the indel frequencies of different RGENs in HEK293T cells. TnpBmax (mean = 27.9 %, N = 94); ISAam1max (mean = 9.1 %, N = 98); ISYmu1max (mean = 10.1 %, N = 68); AsCas12fa (CasMINI, mean = 5.6 %, N = 58); Nme2Cas9a (mean = 16.3 %, N = 82); CjCas9a (mean = 22.9 %, N = 67); SpCas9a (mean = 87.9 %, N = 91); aData of CRISPR RGENs from Schmidheini et al., 2023(8). N, number of individual target sites. Each dot represents the mean of n ≥ 3 independent biological replicates. Means were compared by two-tailed t-test. (G) Schematic representation of a nuclease-deficient TnpB(D191A) adenine base editor and adenine base editing at seven individual DNA target sites with TnpB(D191A)-TadA8e (C-ABE) or TadA8e-TnpB(D191A) (N-ABE). Adenine bases (A) in the DNA R-loop are converted to Inosine (I) by TadA8e fused to TnpB. Inosine is repaired to Guanine (G) within the cell. Substrate bases for the base editor are highlighted (bold lines).
Figure 2
Figure 2. Massively parallel target-matched library screen reveals principles for ωRNA guide design.
(A) Schematic representation of the target-matched ωRNA library screen in HEK293T cells. TAM, transposon adjacent motif; MM, mismatch; HDVr, hepatitis delta virus ribozyme; txn, transfection; d, days. (B) Per-position nucleotide representation of target sites performing above (Pattern A) or below (Pattern B) the average. (C) Editing efficiencies in Neuro-2a or HEK293T cells with pattern A or B synthetically integrated and transfected with the respective ωRNA. Bar represents the mean ± s.d. of n ≥ 2 independent biological replicates. (D-G) Position-dependent impact of single (1x) or double (2x) transition or transversion mismatches (MM) on ωRNA activity. Dots represent the mean of n = 3 independent biological replicates of N = 4 individual target sites. Box plots 25th and 75th percentiles and whiskers down to the minimum and up to the maximum value and plots each individual value. The line in the box is plotted at the median. (H) Normalized TnpB-mediated Indels (FC, fold change) in the DNA target with one-nucleotide deletion throughout the target region. n = 3 independent biological replicates. (I) Influence of ωRNA length (15-25 nt) on ωRNA activity relative to a 20 nt ωRNA. N, number of individual target sites; 15-nt (mean = 1.05, N = 9); 16-nt (mean = 0.95, N = 9); 17-nt (mean = 1.05, N = 8); 18-nt (mean = 1.04, N = 9); 19-nt (mean = 1.77, N = 8); 21-nt (mean = 0.91, N = 6); 22-nt (mean = 0.87, N = 7); 23-nt (mean = 1.02, N = 7); 24-nt (mean = 0.76, N = 7); 25-nt (mean = 1.05, N = 7). (J) Schematic overview of the GUIDE-seq workflow for TnpB off-target detection. dsODN, double-stranded oligodeoxynucleotide; DSB, double-strand break (K) Sequences of off-target sites identified by GUIDE-seq. The top line presents the intended target sequence with cleaved sites below and mismatches to the on-target site highlighted in color. GUIDE-seq read counts are shown on the right.
Figure 3
Figure 3. Structure-guided rational engineering of TnpB to accept alternative TAMs.
(A) Molecular characterization of the 6-nucleotide TAM of TnpB via the high-throughput TAM detection assay. HTS, high-throughput-sequencing. (B) Cleavage rate (k) for two individual ωRNA on 46 TAMs each. (C) TnpB activity on mismatched TAMs at seven individual target sites in HEK293T cells normalized to the activity on the 5’-TTGAT TAM. The non-canonical base in the TAM is shown in lowercase and highlighted in red. Each datapoint represents the average of n = 3 independent biological replicates. (D) Structural details of 5’-TTGAT TAM sequence recognition (from PDB 8EXA). Residues K76, Q80, and dG-3 are highlighted. (E) Logo plots of the top 10 TAM motifs derived from HT-TAMDA of TnpBmax and rationally engineered variants thereof. (F-G) Activity of TnpBmax and variants thereof tested on 5’-TTtAT, and 5’-TTGAT TAMs in HEK293T cells. Bar represents the mean ± s.d. of n = 2 independent biological replicates. (H) TnpB-WT and TnpB-K76A activity on 11 individual target sites tested on 5’-TTGAT, 5’-TTtAT, 5’-TTcAT, and 5’-TTaAT TAMs. Values represent the mean of n = 2 independent biological replicates. (I) Sequences of off-target sites identified by GUIDE-seq. The top line presents the intended target sequence with cleaved sites below and mismatches to the on-target site highlighted in color. GUIDE-seq read counts are shown on the right.
Figure 4
Figure 4. Machine learning accurately predicts ωRNA activity.
(A) Comparison of 12 machine learning algorithms predicting TnpB editing efficiency. feat, Feature; seq, sequence; XGBoost, eXtreme Gradient Boosting; FNN, feedforward neural network; CNN, convolutional neural network; RNN, recurrent neural network. Values represent mean + s.d. of n = 5 runs. (B-C) Schematic representation of the two best-performing algorithms (CNN and RNN), thereafter termed TEEP. (D-E) Performance evaluation of TEEP-CNN and TEEP-RNN on sequences from the model training (test dataset). r, Pearson’s correlation coefficient; N, number of individual target sites. Datapoints represent the mean of n = 3 independent biological replicates. (F-G) Validation of TEEP-CNN and TEEP-RNN predictions on target-matched libraries integrated in HEK293T and Neuro-2a cells. (H-I) Performance evaluation of TEEP-CNN and TEEP-RNN on individual endogenous loci in HEK293T and Neuro-2a cells. (J) Correlation of TEEP-CNN and TEEP-RNN predictions on an external dataset by Nakagawa et al (6). (K) Performance evaluation of TEEP-CNN and TEEP-RNN on ωRNAs tested with TnpB-K76A and 5’-TTtAT TAMs in HEK293T cells. N, number of individual target sites; r, Pearson’s correlation coefficient. Dots represent the mean of n = 3 independent biological replicates.
Figure 5
Figure 5. Programmable in vivo genome editing with TnpBmax.
(A) Schematic representation of the ISDra2 TnpB 3' end and the overlapping ωRNA. RuvC, nuclease domain; aa, amino acids; nt, nucleotides. (B) Identification of a minimal ωRNA by progressive trimming in HEK293T cells. Bar represents the mean ± s.d. of n = 4 independent biological replicates. (C-D) Comparison of ωRNAs with and without (w/o) hepatitis delta virus ribozyme (HDVr) on eleven individual target sites (TS) in HEK293T cells; n = 3 independent biological replicates. Box plots 25th and 75th percentiles and whiskers down to the minimum and up to the maximum value and plots each individual value. The line in the box is plotted at the median. (E) TEEP predictions (left) and experimental values (right, Neuro-2a cells) for eight Dnmt1-targeting ωRNAs and four Pcsk9-targeting ωRNAs. The arrow indicates the ωRNAs picked for in vivo validation. Bar represents the mean ± s.d. of n ≥ 3 independent biological replicates (for experimental values). (F) Schematic representation of the single-stranded (ss) AAV9 and self-complementary (sc) AAV9 designs for in vivo use. AAV9, adeno-associated virus serotype 9; EFS, EF-1a short promoter; P3, liver-specific promoter; U6, Pol III-dependent promoter for ωRNA expression; NLS, nuclear localization sequence; HDVr, hepatitis delta virus ribozyme; WPRE, Woodchuck Hepatitis virus posttranscriptional regulatory element. Schematic representation of AAV injection routes in C57BL/6J newborn or adult mice. ICV, intracerebroventricular. (G-I) TnpB mediated editing at the Dnmt1 locus determined by deep amplicon sequencing in separated brain regions of mice treated with 5.0×1012 vg/kg (ssAAV) or 5.0×1013 vg/kg (ssAAV and scAAV). BS, brain stem; CTX, cortex; Hipp, hippocampus; Hypo, hypothalamus; MB, midbrain; OB, olfactory bulb; ST, striatum; TM, thalamus; CTRL, control. Each dot represents data from one animal; bar represents the mean ± s.d. of n = 3 animals. (J) Editing in newborn mice treated with either 1.0×1013 vg/kg or 5.0×1013 vg/kg of ssAAV9-TnpB-Pcsk9. Each dot represents data from one animal; bar represents the mean ± s.d. of n = 5 animals for the 1.0×1013 vg/kg dose and n = 3 animals for the 5.0×1013 vg/kg dose and the control. (K) Editing efficiencies of TnpB delivered from dose-matched single-stranded and self-complementary AAV9 in adult mice. Each dot represents data from one animal; bar represents the mean ± s.d. of n = 3 animals. (L) Relative Pcsk9 mRNA, PCSK9 protein, and low-density lipoprotein (LDL) levels in adult mice treated with 5.0×1013 vg/kg of scAAV9-TnpB-Pcsk9. Values were normalized to untreated control mice. Each dot represents data from one animal; bar represents the mean ± s.d. of n = 3 animals (mRNA and protein levels) or only the mean (LDL) of n = 2 animals. (M) Editing efficiencies of TnpB delivered from self-complementary AAV9 in adult mice in heart, kidney, lung, and genital organs. Each dot represents data from one animal; bar represents the mean ± s.d. of n = 3 animals. (N) Schematic representation of the on- and off-target assessment via CAST-seq. CAST-seq exploits locus-specific decoy primers to improve the sensitivity in detecting off-target mediated translocations and chromosomal aberrations at the on-target site. (O) CAST-seq analysis of genomic DNA isolated from adult mice treated with scAAV9-TnpB-Pcsk9 (5.0×1013 vg/kg). Circos plot shows on-target rearrangements in green and off-target mediated translocations (OMT) in red (none present).

References

    1. Jinek M, Chylinski K, Fonfara I, Hauer MH, Doudna JA, Charpentier E. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. 2012;337:816–821. doi: 10.1126/science.1225829. - DOI - PMC - PubMed
    1. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA, Zhang F. Multiplex Genome Engineering Using CRISPR/Cas Systems. Science. 2013;339:819–823. doi: 10.1126/science.1231143. - DOI - PMC - PubMed
    1. Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, Norville JE, Church GM. RNA-Guided Human Genome Engineering via Cas9. Science. 2013;339:823–826. doi: 10.1126/science.1232033. - DOI - PMC - PubMed
    1. Altae-Tran H, Kannan S, Demircioglu FE, Oshiro R, Nety SP, McKay LJ, Dlakić M, Inskeep WP, Makarova KS, Macrae RK, Koonin EV, et al. The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases. Science. 2021;374:57–65. doi: 10.1126/science.abj6856. - DOI - PMC - PubMed
    1. Karvelis T, Druteika G, Bigelyte G, Budre K, Zedaveinyte R, Silanskas A, Kazlauskas D, Venclovas Č, Siksnys V. Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature. 2021;599:692–696. doi: 10.1038/s41586-021-04058-1. - DOI - PMC - PubMed

Substances

LinkOut - more resources