Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr;41(4):488-499.
doi: 10.1038/s41587-022-01494-w. Epub 2022 Oct 10.

Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome

Affiliations

Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome

Matthew G Durrant et al. Nat Biotechnol. 2023 Apr.

Abstract

Large serine recombinases (LSRs) are DNA integrases that facilitate the site-specific integration of mobile genetic elements into bacterial genomes. Only a few LSRs, such as Bxb1 and PhiC31, have been characterized to date, with limited efficiency as tools for DNA integration in human cells. In this study, we developed a computational approach to identify thousands of LSRs and their DNA attachment sites, expanding known LSR diversity by >100-fold and enabling the prediction of their insertion site specificities. We tested their recombination activity in human cells, classifying them as landing pad, genome-targeting or multi-targeting LSRs. Overall, we achieved up to seven-fold higher recombination than Bxb1 and genome integration efficiencies of 40-75% with cargo sizes over 7 kb. We also demonstrate virus-free, direct integration of plasmid or amplicon libraries for improved functional genomics applications. This systematic discovery of recombinases directly from microbial sequencing data provides a resource of over 60 LSRs experimentally characterized in human cells for large-payload genome insertion without exposed DNA double-stranded breaks.

PubMed Disclaimer

Conflict of interest statement

M.G.D., J.T., A.F., M.H., M.C.B., L.B., A.S.B. and P.D.H. are inventors on intellectual property related to this work. P.D.H. is a cofounder of Spotlight Therapeutics and Moment Biosciences and serves on the boards of directors and scientific advisory boards and is a scientific advisory board member to Vial Health, Serotiny and Varda Space. A.S.B. serves on the scientific advisory boards of ArcBio and Caribou Biosciences. M.G.D., A.F., J.T., L.B., M.C.B., A.S.B. and P.D.H. acknowledge outside interest in Stylus Medicine. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Systematic discovery and classification of LSRs and their target site specificities.
a, Schematic of the computational workflow for systematic identification of LSRs and inference of their attachment sites. The gene harboring the recombinase domain is shown as a red rectangle. b, Phylogenetic tree of representative LSR orthologs clustered at 50% identity, annotated according to predicted target specificity of each LSR cluster. ‘Unique target gene clusters’ indicates the number of predicted target gene clusters, dots scaled to indicate the number of unique LSR sequences found in each LSR cluster. c, Schematic of the technique to identify site-specific LSRs that target a single gene cluster. The typical domain architecture of a site-specific LSR is illustrated. d, Schematic of the technique to identify multi-targeting LSRs. In brief, if a single cluster of related LSRs (clustered at 90% identity) integrates into multiple diverse target gene clusters (clustered at 50% identity), then the LSR cluster is considered multi-targeting. The typical domain architecture of a multi-targeting LSR is illustrated, commonly including a particular domain of unknown function, DUF4368. e, Example of an observed network of predicted site-specific LSRs found in our database. Each node indicates either an LSR cluster (red) or a target gene cluster (blue). Edges between nodes indicate that at least one member of the LSR cluster was found integrated into at least one member of the target gene cluster. f, Example of a hierarchical tree of diverse LSR sequences that target a set of closely related attB sequences. Numbers at the tip of the tree indicate the attB sequences in the alignment that are targeted by each LSR. Bottom is the alignment of related attB sequences. g, Example of an observed network of predicted multi-targeting LSRs. h, Schematic of an alignment of diverse attB sequences that are targeted by a single multi-targeting LSR. Each target sequence is aligned with respect to the core TT dinucleotide. Sequence logo above the alignment indicates conservation across target sites, a proxy for the sequence specificity of this particular LSR. The alignment is colored according to the consensus.
Fig. 2
Fig. 2. Development of efficient recombinases for landing pads.
a, Schematic of plasmid recombination assay. Cells are co-transfected with three plasmids, and, upon recombination, mCherry gains a promoter and is expressed. b, Plasmid recombination assay of predicted LSRs and att sites in HEK293FT cells, shown as corrected mCherry MFI. Error = s.d. (n = 3). P value was determined by one-tailed t-test. c, Example mCherry distributions for all three plasmids (LSR + attB + attP) compared to the attP-only negative control. d, Plasmid recombination assay between pairs of LSR + attP and attB in K562 cells (n = 1). e, Schematic of genomic landing pad assay. An EF-1α promoter, attB and LSR are integrated via lentivirus. Upon attP donor transfection and successful integration into the landing pad, mCherry is expressed, and the LSR and GFP are displaced and knocked out. f, Donor integration into polyclonal genomic landing pad (LP) K562 cell lines, measured after 5 days (n = 2 independently transduced and then electroporated replicates). g, Donor integration into clonal LP cells. Asterisks show significance for comparison with Bxb1 (P = 0.0012, one-way ANOVA, n = 3 clonal cell lines for Pa01 and n = 4 clonal cell lines for others at 1,000-ng dose, error = s.e.m.). h, Pa01 clonal LP line electroporated twice in rapid succession. i, Plasmid recombination assay for a new batch of LSRs selected for higher quality (Methods) in HEK293FT cells, shown as corrected mCherry MFI. Error = s.d. (n = 3 transfections). Controls are labeled in bold, and the previous batch is in italics. The dotted line indicates the positive control Bxb1. P value was determined by one-tailed t-test. j, Representative mCherry distributions for all three plasmids (LSR + attB + attP) compared to the attP-only negative control.
Fig. 3
Fig. 3. PRAs and amplicon library installation with landing pad recombinases.
a, Schematic of mini PRA. A pool of reporter plasmids with varied synthetic enhancers containing TetO sites is integrated into the landing pad by the LSR, selected for using puromycin, and then reporter activation is induced using doxycycline, which causes rTetR-VP48 to bind TetO. Highly activated and lowly activated cells are magnetically separated; the enhancers are sequenced from gDNA in each cell population; and a ratio of reads is computed as a measurement of enhancer strength. b, Individual enhancer reporters with a varied number of TetO transcription factor binding sites were integrated into the AAVS1 safe harbor by HDR or into the landing pad using the Kp03 LSR. Flow cytometry measurements were taken 2 days after induction with doxycycline. Due to varied voltage settings on the cytometer, the x-axes are not comparable in absolute terms (n = 1 cell line replicate, and a second replicate is shown in Supplementary Fig. 3a). c, A small pooled library of synthetic enhancer reporters was integrated into the AAVS1 safe harbor by HDR or a clonal landing pad by the Kp03 LSR and measured by separation and sequencing (n = 2 integration replicates for HDR; n = 3 integration replicates for LSR; dots show the mean; error = s.d.). ρ is the Spearman correlation between the PRA measurement of enhancer strength and the number of TetO sites in the enhancer. For the LSR, pooled measurements (left y-axis, red circles) correlate with the percentage of citrine+ cells from individual reporter assays (right y-axis, black x, Pearson’s r = 0.94). d, Schematic for a cloning-free strategy to install libraries. A linear dsDNA library of elements containing the attP site is generated by PCR and directly delivered to landing pad cells. e, Schematic of an amplicon library generated by PCR from an attP-mCherry-pA template, where the reverse primer contains a 6×N barcode. f, Distribution of barcodes in the initial amplicon libraries (read depths 216–272×) and in gDNA extracted from cells 7 days after electroporation with 750 ng of amplicon (read depths 290–357×). dox, doxycycline.
Fig. 4
Fig. 4. Discovery of specific human genome-targeting recombinases.
a, Predicted attB and attP sequences were searched against the human genome using BLAST. The attachment site with the best match to the human genome is denoted attA (acceptor), and the corresponding human target site is denoted attH (human). The cognate attachment site is denoted attD (donor). b, BLAST hits of attB and attP sites that are homologous to sequences in the human genome. All hits that meet E < 1 × 10−3 are shown. The 22 autosomal chromosomes are shown in numerical order from left to right in alternating colors. c, Alignments of the microbial attachment sites (attA) to the predicted human attachment sites (attH) for three candidates. The attachment site center is bolded, representing the portion of the native attP and attB that is identical. d, Detected integration loci, ranked according to the number of uniquely mapped reads. Blue points are previously reported integration sites for PhiC31, and red points indicate predicted integration sites for Sp56, Enc3 and Pf80. e, Reads at the top integration site. Reads that align in the forward direction are shown in red, and those aligning in the reverse direction are shown in blue, with a gray line connecting paired reads. f, Detected integration loci for Dn29. UMIs were incorporated into the donor plasmid. The top three integration sites and sites with only one detected UMI (‘rare’) are highlighted. Results of three biological replicates are shown. g, A target site motif for Dn29 calculated using the top 25 target sites in K562 cells. Example integration sites are shown below, including the top three integration sites and three sites with only one detected UMI (rare1, rare2 and rare3). Colored nucleotides match the most common nucleotide at that position in the top 25 sites. h, LSR integration specificity and efficiency. For wild-type cells (black), efficiency is a corrected percentage of mCherry+ cells 18 days after electroporation with an LSR and donor plasmid. For landing pad cells (green), efficiency is the mean of mCherry+ cells in all clones (from Fig. 2g, right). To estimate specificity, UMI counts were used if available; otherwise, uniquely mapped read counts were used, and counts were merged across replicates.
Fig. 5
Fig. 5. Development of efficient multi-targeting recombinases.
a, Efficiency of the multi-targeting integrase Cp36 in K562 cells. Bxb1 paired with a Cp36 attD donor was used as a negative control. Fluorescence was measured out to 12 days after electroporation by flow cytometry. Lines show the mean (n = 2), with dashed lines used for negative controls. b, Integration site mapping assay results for Cp36. The top 500 loci across two experiments, one performed in HEK293FT cells and another performed in K562 cells, are shown. Roman numerals highlight the 3 top sites (i–iii) and one rare site (iv), and correspond with the bottom rows in c. c, Cp36 target site motifs and example target sequences. Precise integration sites and orientations were inferred at all loci, and nucleotide composition was calculated for the top 200 sites. Example integration sites specified in b are shown below the motifs, where nucleotides are highlighted with their respective colors if they match the consensus nucleotide. d, Efficiency of Cp36 and Super PiggyBac for stable delivery of mCherry donor plasmid in K562 cells. The 7.2-kb donor plasmid contains the Cp36 attD and the PiggyBac ITRs. Ec03 LSR was used as a negative control that lacks an attachment site on this donor plasmid (n = 2). e, Wild-type K562 or Cp36-dosed mCherry+ and puromycin-selected cells were transfected with 2,000 ng of a second fluorescent reporter (mTagBFP2) and analyzed by flow cytometry 13 days after electroporation (n = 2). Dash shows negative control treated with BFP donor only. f, Flow cytometry 12 days after electroporation of both fluorescent donors and Cp36 plasmids into K562 cells. Negative control cells were transfected with the donors and pUC19 (n = 2 replicates shown as stacked bars for Cp36 condition).

Comment in

References

    1. Faure G, et al. CRISPR–Cas in mobile genetic elements: counter-defence and beyond. Nat. Rev. Microbiol. 2019;17:513–525. doi: 10.1038/s41579-019-0204-7. - DOI - PMC - PubMed
    1. Salmond GPC, Fineran PC. A century of the phage: past, present and future. Nat. Rev. Microbiol. 2015;13:777–786. doi: 10.1038/nrmicro3564. - DOI - PubMed
    1. Vaidyanathan S, et al. Targeted replacement of full-length CFTR in human airway stem cells by CRISPR–Cas9 for pan-mutation correction in the endogenous locus. Mol. Ther. 2021;30:223–237. doi: 10.1016/j.ymthe.2021.03.023. - DOI - PMC - PubMed
    1. De Ravin SS, et al. Enhanced homology-directed repair for highly efficient gene editing in hematopoietic stem/progenitor cells. Blood. 2021;137:2598–2608. doi: 10.1182/blood.2020008503. - DOI - PMC - PubMed
    1. Kung SH, Retchless AC, Kwan JY, Almeida RPP. Effects of DNA size on transformation and recombination efficiencies in Xylella fastidiosa. Appl. Environ. Microbiol. 2013;79:1712–1717. doi: 10.1128/AEM.03525-12. - DOI - PMC - PubMed

Publication types