Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 May 8:2024.05.08.593200.
doi: 10.1101/2024.05.08.593200.

De novo gene synthesis by an antiviral reverse transcriptase

Affiliations

De novo gene synthesis by an antiviral reverse transcriptase

Stephen Tang et al. bioRxiv. .

Update in

  • De novo gene synthesis by an antiviral reverse transcriptase.
    Tang S, Conte V, Zhang DJ, Žedaveinytė R, Lampe GD, Wiegand T, Tang LC, Wang M, Walker MWG, George JT, Berchowitz LE, Jovanovic M, Sternberg SH. Tang S, et al. Science. 2024 Oct 4;386(6717):eadq0876. doi: 10.1126/science.adq0876. Epub 2024 Oct 4. Science. 2024. PMID: 39116258 Free PMC article.

Abstract

Bacteria defend themselves from viral infection using diverse immune systems, many of which sense and target foreign nucleic acids. Defense-associated reverse transcriptase (DRT) systems provide an intriguing counterpoint to this immune strategy by instead leveraging DNA synthesis, but the identities and functions of their DNA products remain largely unknown. Here we show that DRT2 systems execute an unprecedented immunity mechanism that involves de novo gene synthesis via rolling-circle reverse transcription of a non-coding RNA (ncRNA). Unbiased profiling of RT-associated RNA and DNA ligands in DRT2-expressing cells revealed that reverse transcription generates concatenated cDNA repeats through programmed template jumping on the ncRNA. The presence of phage then triggers second-strand cDNA synthesis, leading to the production of long double-stranded DNA. Remarkably, this DNA product is efficiently transcribed, generating messenger RNAs that encode a stop codon-less, never-ending ORF (neo) whose translation causes potent growth arrest. Phylogenetic analyses and screening of diverse DRT2 homologs further revealed broad conservation of rolling-circle reverse transcription and Neo protein function. Our work highlights an elegant expansion of genome coding potential through RNA-templated gene creation, and challenges conventional paradigms of genetic information encoded along the one-dimensional axis of genomic DNA.

PubMed Disclaimer

Conflict of interest statement

Columbia University has filed a patent application related to this work. S.H.S. is a co-founder and scientific advisor to Dahlia Biosciences, a scientific advisor to CrisprBits and Prime Medicine, and an equity holder in Dahlia Biosciences and CrisprBits.

Figures

Fig. 1.
Fig. 1.. Systematic discovery of DRT2 reverse transcription substrates and products in vivo.
(A) Schematic of RNA immunoprecipitation (RIP) and cDNA immunoprecipitation (cDIP) sequencing approaches to identify nucleic acid substrates of FLAG-tagged reverse transcriptase (RT) from KpnDRT2. The plasmid-encoded immune system is schematized top left. (B) MA plots showing the RT-mediated enrichment of RNA (top) and DNA (bottom) loci from RIP-seq and cDIP-seq experiments, relative to input controls. Each dot represents a transcript, and red dots denote transcripts with > 20-fold enrichment and false discovery rate (FDR) < 0.05. (C) dRNA-seq, Term-seq, RIP-seq, and cDIP-seq coverage tracks, from top to bottom, for either WT RT or a catalytically inactive RT mutant (YCAA). dRNA-seq and Term-seq enrich RNA 5′ and 3′ ends, respectively, whereas RIP-seq and cDIP-seq identify RT-associated RNA and DNA ligands. Red and pink denote top and bottom strands, respectively, and the DRT2 locus is shown at bottom; coordinates are numbered from the beginning of the K. pneumoniae-derived sequence on the expression plasmid. Data are normalized for sequencing depth and plotted as counts per million reads (CPM). (D) Predicted secondary structure of the KpnDRT2 ncRNA. The cDNA template region is colored in pink, and the gray dotted line denotes the direction of reverse transcription. (E) Coverage over the DRT2 ncRNA locus from total DNA sequencing of cells +/− T5 phage infection (left), and bar graph of cDNA counts for the same samples alongside the YCAA mutant (right). Red and pink denote top and bottom strands, respectively; data are mean ± s.d. (n = 3).
Fig. 2.
Fig. 2.. Rolling-circle reverse transcription generates a concatenated cDNA product.
(A) Schematic of DRT2 ncRNA secondary structure, with stem-loops (SL) numbered 1–8 and selected perturbations highlighted in red. SL1MUT, SL5MUT, and SL6MUT correspond to ncRNA mutants in which the SL bases were scrambled, resulting in the elimination of sequence motifs and secondary structure. SL2MUT abolishes base pairing within the SL2 stem. Sequences of all mutants are presented in Supplementary Table S3. (B) Plaque assay showing loss of phage defense activity for all SL mutants from A (left), and bar graph quantifying the reduction in efficiency of plating (EOP, right); data are mean ± s.d. (n = 3). (C) RIP-seq and cDIP-seq coverage tracks for the indicated SL mutants alongside input controls, revealing a range of defects in either RNA binding, cDNA synthesis/binding, or both. (D) Top: Schematic of terminal portions of cDIP-seq reads (light gray) failing to align to the cDNA reference, resulting in ‘soft clipping’ and exclusion from coverage plots. A donut plot reporting the proportion of cDNA-mapping reads with the indicated lengths of 3′-clipped sequences is shown at left for DRT2 WT cDIP-seq. Bottom: Mapping of 3′-soft-clipped sequences from cDIP-seq experiments back to the DRT2 locus, demonstrating that they derive from the cDNA 5′ end. SL2MUT exhibits an aberrant pattern relative to WT. (E) Schematic of sequencing reads that map across two concatenated cDNA repeats (top), and bar graph quantifying the abundance of junction-spanning reads from sequencing of total DNA in the indicated conditions (bottom). Red and pink denote top and bottom strands, respectively; data are mean ± s.d. (n = 3). (F) Schematic of long-read Nanopore sequencing workflow with DNA from phage-infected cells (top), and histogram of cDNA repeat length distribution for WT KpnDRT2 from Nanopore sequencing (bottom). (G) Inferred mechanism of rolling-circle reverse transcription (RCRT) mediated by sequence and structural features of SL2. After synthesis of 5′-TGT-3′ templated by ACA-1 at the end of one cDNA repeat (top), the nascent DNA strand dissociates from its template and reanneals with the complementary ACA-2 following SL2 melting (middle). Template jumping initiates a subsequent round of reverse transcription, with concatenation of one cDNA repeat to the next and incorporation of one additional base at the repeat junction, ultimately leading to long rolling-circle cDNA products (bottom).
Fig. 3.
Fig. 3.. The concatenated cDNA product contains a never-ending ORF (neo).
(A) Bar graph quantifying RNA-seq reads that map across two concatenated cDNA repeats, for the indicated conditions. Red and pink denote top and bottom strands, respectively; data are mean ± s.d. (n = 3). (B) Model showing the consecutive production of ncRNA (transcription), concatenated double-stranded cDNA (reverse transcription and second-strand synthesis), and concatenated RNA (transcription), all encoded by the DRT2 locus. Dashed lines indicate repeat–repeat junctions resulting from rolling-circle reverse transcription, and the inset (top left) shows the predicted promoter formed across the junction. (C) Bar graph quantifying relative concatenated RNA abundance in a phage infection time course experiment using RT-qPCR with repeat junction primers (top), and Northern blot of concatenated RNA using a junction-spanning probe (bottom). RT-qPCR data are normalized to WT uninfected cells (t = 0); data are mean ± s.d. (n = 3). (D) Putative open reading frame (ORF) encoded by the concatenated RNA. The start of cDNA synthesis and putative start of translation are indicated (pink and blue arrows, respectively), and the repeat–repeat junction is denoted with a dashed line. (E) Schematic of the cDNA template region (pink), with the putative start codon and experimentally tested mutations indicated. (F) Plaque assay showing that phage defense activity is eliminated with a single-bp substitution that introduces an in-frame stop codon, but is only modestly affected by synonymous or missense mutations. EV, empty vector. (G) Bar graph quantifying phage defense activity for insertions within SL3, SL4, or SL5, of the indicated length. Reduction in EOP is calculated relative to an EV control; data are mean ± s.d. (n = 3). The only mutants that retain phage defense activity have insertion lengths of a multiple of 3 bp.
Fig. 4.
Fig. 4.. Neo proteins induce programmed cellular dormancy.
(A) Schematic of experimental approach to detect Neo in phage-infected cells by liquid chromatography with tandem mass spectrometry (LC-MS/MS). (B) Bar graph quantifying Neo protein quantity from cells tested in the indicated conditions. Data are mean ± s.d. (n = 3). (C) Abundance of RT and Neo proteins relative to the E. coli proteome in phage-infected cells expressing WT DRT2. (D) Differential protein abundance in T5-infected cells expressing DRT2 WT or YCAA. Phage proteins are colored in brown, and ArfA and RMF are colored in red and labeled. All other differentially abundant proteins (fold change > 2 and FDR < 0.05) are colored in dark blue. (E) Schematic of alternative ribosome rescue pathway mediated by ArfA, which would release Neo proteins from ribosomes stalled on non-stop neo mRNAs without targeting them for degradation (right), unlike the tmRNA pathway (left). (F) Growth curves of strains transformed with empty vector (EV) or the WT DRT2 system, +/− T5 phage at the indicated multiplicity of infection (MOI). Shaded regions indicate the standard deviation across independent biological replicates (n = 3). (G) Schematic of cloning and inducible expression strategy to monitor the physiological effects of Neo polypeptides of variable repeat length. (H) Growth curves of strains transformed with WT or scrambled Neo sequences of the indicated repeat lengths, alongside an empty vector (EV) control. The dashed line indicates the point of induction with arabinose and theophylline. Shaded regions indicate the standard deviation across independent biological replicates (n = 3).
Fig. 5.
Fig. 5.. Concatenated neo genes and programmed dormancy are a broadly conserved phage defense mechanism.
(A) Schematic for the automated detection of putative Neo proteins in homologous DRT2 operons. (B) Phylogenetic tree of KpnDRT2 homologs, with outer rings showing the widespread presence of RT-associated ncRNAs and putative Neo proteins. Homologs selected for experimental testing are indicated with pink circles. (C) Multiple sequence alignment (MSA) and secondary structure prediction of Neo proteins identified in B. A single Neo repeat is shown for all homologs; shading indicates amino acid conservation. (D) AlphaFold prediction of a 3-repeat Neo polypeptide, showing the sites of proline mutagenesis tested in E. Prolines were inserted C-terminal to the indicated residues within each of 3 concatenated repeats. (E) Growth curves of strains transformed with 3-repeat Neo constructs containing the indicated proline insertions, alongside an empty vector (EV) control. The dashed line indicates the point of induction with arabinose and theophylline. Shaded regions indicate the standard deviation across independent biological replicates (n = 3). (F) Heat map showing the distribution of neo cDNA repeat lengths in cells expressing the indicated DRT2 homologs. Data are plotted as log10(CPM) from Nanopore sequencing of total DNA. (G) Heat map showing the growth rates of cells expressing Neo homologs with the indicated repeat lengths. Growth rates are normalized to an EV control and represent the mean of independent biological replicates (n = 3). Empty cells with X indicate Neo expression constructs that could not be successfully cloned, presumably due to toxicity. (H) Model for the antiphage defense mechanism of DRT2 systems. RT enzymes bind the scaffold portion of associated ncRNAs and produce concatenated cDNA products via rolling-circle reverse transcription (RCRT). Phage infection triggers second-strand synthesis, yielding a dsDNA molecule that is transcribed into never-ending ORF (neo) mRNAs. Neo translation exploits a ribosome rescue pathway to produce Neo proteins that potently arrest cell growth, protecting the larger bacterial population from the spread of phage.

Similar articles

References

    1. Frost L. S., Leplae R., Summers A. O., Toussaint A., Mobile genetic elements: the agents of open source evolution. Nat Rev Microbiol 3, 722–732 (2005). - PubMed
    1. Aziz R. K., Breitbart M., Edwards R. A., Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Research 38, 4207–4217 (2010). - PMC - PubMed
    1. Canapa A., Barucca M., Biscotti M. A., Forconi M., Olmo E., Transposons, Genome Size, and Evolutionary Insights in Animals. Cytogenetic and Genome Research 147, 217–239 (2016). - PubMed
    1. Koonin E. V., Krupovic M., Evolution of adaptive immunity from transposable elements combined with innate immune systems. Nat Rev Genet 16, 184–192 (2015). - PMC - PubMed
    1. Koonin E. V., Makarova K. S., Wolf Y. I., Krupovic M., Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire. Nat Rev Genet 21, 119–131 (2020). - PubMed

Publication types