Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 7;118(49):e2112279118.
doi: 10.1073/pnas.2112279118.

Metagenomic discovery of CRISPR-associated transposons

Affiliations

Metagenomic discovery of CRISPR-associated transposons

James R Rybarski et al. Proc Natl Acad Sci U S A. .

Abstract

CRISPR-associated Tn7 transposons (CASTs) co-opt cas genes for RNA-guided transposition. CASTs are exceedingly rare in genomic databases; recent surveys have reported Tn7-like transposons that co-opt Type I-F, I-B, and V-K CRISPR effectors. Here, we expand the diversity of reported CAST systems via a bioinformatic search of metagenomic databases. We discover architectures for all known CASTs, including arrangements of the Cascade effectors, target homing modalities, and minimal V-K systems. We also describe families of CASTs that have co-opted the Type I-C and Type IV CRISPR-Cas systems. Our search for non-Tn7 CASTs identifies putative candidates that include a nuclease dead Cas12. These systems shed light on how CRISPR systems have coevolved with transposases and expand the programmable gene-editing toolkit.

Keywords: CAST; CRISPR RNA; bioinformatics; gene editing; transposition.

PubMed Disclaimer

Conflict of interest statement

Competing interest statement: The authors are coinventors on patent applications filed based on this work.

Figures

Fig. 1.
Fig. 1.
CAST detection and classification. (A) A bioinformatic pipeline for the discovery of CASTs. Brown: transposase genes; blue: cas genes; dotted: ORFs; gray: gene neighborhoods. Neighborhoods satisfying initial search criteria are marked with a green check. Red “x” denotes a neighborhood that does not match the initial search criteria (e.g., no detected cas genes). (B) A summary of the stepwise filtering strategy to identify high-confidence Type I-F Tn7 CASTs. (C) The distribution of Tn7-associated CAST subtypes in the NCBI microbial genome and EMBL metagenomic databases.
Fig. 2.
Fig. 2.
A summary of Type I-F CASTs. (A) The gene architectures of Type I-F3a, I-F3b, and I-F3c systems. Unique gene architectures include tniQ-cas8 fusions, split cas8 and cas5, and dual cas7 systems. Purple: att site; blue: left and right transposon ends. Black diamonds: canonical direct repeats; gray diamonds: atypical direct repeats. Rectangles: spacers; purple rectangle: homing spacer. The arrow indicates the target site. The slanted gapped lines indicate elided cargo regions. (B) The distribution of att site genes in the NCBI and the metagenomic databases. (C, Top) The sequence of a CRISPR array with a short, atypical spacer (purple) that may assemble a mini Cascade. The red bases are those that differ from the consensus repeat sequence. (Bottom) A schematic of an atypical crRNA and its target DNA sequence. (D) Web logos of the PAM and right inverted repeat adjacent to each att site. The TnsB-binding site and the homing PAMs are conserved within subsystems.
Fig. 3.
Fig. 3.
An analysis of Type I-B CASTs. (A, Left) The gene architectures of Type I-B systems. Systems can dispense with either the first or the second tniQ/tnsD, suggesting alternative targeting lifestyles. Type I-B4 systems have a unique architecture that most resembles Type V CASTs. Colored rectangles correspond to phylogenetic groups in B. (Right) The distribution of Type I-B subsystems in the metagenomic database. (B) A phylogenetic tree with TniQ/TnsD variants from Type I-B and I-F CASTs as well as from the canonical Tn7 and Tn5053 transposons. The values at branch points are bootstrap support percentages. (C, Top) The sequence of a Type I-B4 CRISPR array with a short, atypical spacer. (Bottom) A schematic of an atypical crRNA base paired with a target DNA sequence. The red bases are those that differ from the consensus repeat sequence. (D) Domain maps of TniQ/TnsD proteins. Regions homologous to the TniQ superfamily and the TnsD superfamily are indicated in pink and light green, respectively. The Type I-B4 system encodes the shortest TniQ variant.
Fig. 4.
Fig. 4.
New Tn7 CASTs from metagenomic databases. (A, Top) The gene architecture of a Type IV CAST. This system lacks a CRISPR array but encodes a homing spacer. The genes highlighted by colored rectangles correspond to genes in B. (Bottom) A schematic of a short, homing spacer base paired with its target DNA sequence. (B) Phylogenetic trees of Cas6 and Cas7 indicate that the Type IV CAST most closely resembles Type IV-A3 CRISPR-Cas systems. The values at branch points are bootstrap support percentages. (C, Top) The gene architecture of Type I-C systems. We did not detect any CRISPR arrays or atypical homing spacers. (Bottom) A phylogenetic tree of Cas8 confirms that this system is closely related to Type I-C Cascades. The values at branch points are bootstrap support percentages.
Fig. 5.
Fig. 5.
An analysis of Type V CASTs. (A) The gene architectures of Type V CASTs, including dual-insertion systems (Bottom two rows). The colored rectangles around genes correspond to alignments in D and E. (B) A schematic of interactions between the target site DNA, a homing crRNA, and a tracrRNA. (C) A web logo of PAM sequences found adjacent to spacer targets. (D) Aligned domain maps of truncated TnsC variants. Gray diagonal stripes indicate the TnsD-interacting region. Truncated TnsCs lack the TnsA- and TnsB-interacting domains but generally retain the ATPase domain and most of the TnsD-interacting domain. The shortest TnsC has also lost its ATPase domain. (E) Aligned domain maps of truncated TnsB variants. Type V CAST TnsB is shorter than Tn7 TnsB but contains the functionally annotated domains. In some dual TnsB systems, the first tnsB encodes the N-terminal region, and the second encodes the C-terminal portion.
Fig. 6.
Fig. 6.
A family of putative non-Tn7 CASTs. (A) The defining features of this family of systems are an Rpn family (PDDEXK domain-containing) nuclease/transposase near a nuclease-dead Cas12 or a Type I-E Cascade complex. The operon is enriched for nucleic acid–processing proteins. We also observed homing spacers (magenta, black arrows) and short inverted repeats (blue) in some systems. (B) Multiple sequence alignment of Rpn proteins with the putative transposases from these systems. Residues critical for DNA cleavage in the PDDEXK domain are highlighted in red. The D165A mutant in RpnA more than doubles recombination in vivo; this aspartic acid is highlighted in red below the transposase_31 domain. (C) A schematic of an atypical homing spacer and its DNA target. The PAM is highlighted. (D) Multiple sequence alignment of nuclease-active Cas12a and putative CAST Cas12 proteins. Putative CAST Cas12 proteins retain the conserved residues in the WED domain that are essential for crRNA processing but lack an aspartic residue in the RuvC domain that is essential for DNA cleavage.

References

    1. Peters J. E., Targeted transposition with Tn7 elements: Safe sites, mobile plasmids, CRISPR/Cas and beyond. Mol. Microbiol. 112, 1635–1644 (2019). - PMC - PubMed
    1. Peters J. E., Makarova K. S., Shmakov S., Koonin E. V., Recruitment of CRISPR-Cas systems by Tn7-like transposons. Proc. Natl. Acad. Sci. U.S.A. 114, E7358–E7366 (2017). - PMC - PubMed
    1. Halpin-Healy T. S., Klompe S. E., Sternberg S. H., Fernández I. S., Structural basis of DNA targeting by a transposon-encoded CRISPR-Cas system. Nature 577, 271–274 (2020). - PubMed
    1. Jia N., Xie W., de la Cruz M. J., Eng E. T., Patel D. J., Structure-function insights into the initial step of DNA integration by a CRISPR-Cas-Transposon complex. Cell Res. 30, 182–184 (2020). - PMC - PubMed
    1. Klompe S. E., Vo P. L. H., Halpin-Healy T. S., Sternberg S. H., Transposon-encoded CRISPR-Cas systems direct RNA-guided DNA integration. Nature 571, 219–225 (2019). - PubMed

Publication types

MeSH terms

LinkOut - more resources