. 2024 Oct 4;386(6717):eadq0876.

doi: 10.1126/science.adq0876. Epub 2024 Oct 4.

De novo gene synthesis by an antiviral reverse transcriptase

Affiliations

¹ Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
² Department of Biological Sciences, Columbia University, New York, NY, USA.
³ Department of Genetics and Development, Columbia University, New York, NY, USA.
⁴ Taub Institute for Research on Alzheimer's and the Aging Brain, New York, NY, USA.

^# Contributed equally.

PMID: 39116258
PMCID: PMC11758365
DOI: 10.1126/science.adq0876

De novo gene synthesis by an antiviral reverse transcriptase

Stephen Tang et al. Science. 2024.

. 2024 Oct 4;386(6717):eadq0876.

doi: 10.1126/science.adq0876. Epub 2024 Oct 4.

Affiliations

¹ Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
² Department of Biological Sciences, Columbia University, New York, NY, USA.
³ Department of Genetics and Development, Columbia University, New York, NY, USA.
⁴ Taub Institute for Research on Alzheimer's and the Aging Brain, New York, NY, USA.

^# Contributed equally.

PMID: 39116258
PMCID: PMC11758365
DOI: 10.1126/science.adq0876

Abstract

Defense-associated reverse transcriptase (DRT) systems perform DNA synthesis to protect bacteria against viral infection, but the identities and functions of their DNA products remain largely unknown. We show that DRT2 systems encode an unprecedented immune pathway that involves de novo gene synthesis through rolling circle reverse transcription of a noncoding RNA (ncRNA). Programmed template jumping on the ncRNA generates a concatemeric cDNA, which becomes double-stranded upon viral infection. This DNA product constitutes a protein-coding, nearly endless open reading frame (neo) gene whose expression leads to potent cell growth arrest, restricting the viral infection. Our work highlights an elegant expansion of genome coding potential through RNA-templated gene creation and challenges conventional paradigms of genetic information encoded along the one-dimensional axis of genomic DNA.

PubMed Disclaimer

Conflict of interest statement

Competing interests:

Columbia University has filed patent applications related to this work, for which S.H.S., S.T., D.J.Z., G.D.L., T.W., and J.T.G. are inventors. S.H.S. is a cofounder and scientific advisor to Dahlia Biosciences, a scientific advisor to CrisprBits and Prime Medicine, and an equity holder in Dahlia Biosciences and CrisprBits.

Figures

**Fig. 1.. Systematic discovery of DRT2 reverse transcription substrates and products in vivo.**
(A) Schematic of RNA immunoprecipitation (RIP) and cDNA immunoprecipitation (cDIP) sequencing approaches to identify nucleic acid substrates of FLAG-tagged reverse transcriptase (RT) from *Kpn*DRT2. The plasmid-encoded immune system is schematized top left. (B) MA plots showing the RT-mediated enrichment of (top) RNA and (bottom) DNA loci from RIP-seq and cDIP-seq experiments, relative to input controls. Each dot indicates a transcript, and red dots indicate transcripts with >20-fold enrichment and false discovery rate (FDR) < 0.05. (C) dRNA-seq, Term-seq, RIP-seq, and cDIP-seq coverage tracks (top to bottom, respectively) for either WT RT or a catalytically inactive RT mutant (YCAA). dRNA-seq and Term-seq enrich RNA 5′ and 3′ ends, respectively, whereas RIP-seq and cDIP-seq identify RT-associated RNA and DNA ligands. Red and pink indicate top and bottom strands, respectively, and the *Kpn*DRT2 locus is shown at bottom; coordinates are numbered from the beginning of the *K. pneumoniae*–derived sequence on the expression plasmid. Data are normalized for sequencing depth and plotted as counts per million reads (CPM). (D) Predicted secondary structure of the *Kpn*DRT2 ncRNA. The cDNA template region is colored in pink, and the gray dotted line indicates the direction of reverse transcription. (E) (Left) Coverage over the *Kpn*DRT2 ncRNA locus from total DNA sequencing of cells ± T5 phage infection. (Right) Bar graph of cDNA counts for the same samples alongside the YCAA mutant. Red and pink indicate top and bottom strands, respectively; data are mean ± SD (n = 3 biological replicates).

**Fig. 2.. RCRT generates a concatemeric cDNA product.**
(A) Schematic of *Kpn*DRT2 ncRNA secondary structure, with stem loops (SLs) numbered 1 to 8 and selected perturbations highlighted in red. SL1^MUT, SL5^MUT, and SL6^MUT correspond to ncRNA mutants in which the SL bases were scrambled, resulting in the elimination of sequence motifs and secondary structure. SL2^MUT abolishes base pairing within the SL2 stem by mutating the right side of the stem to its complement. Sequences of all mutants are presented in table S3. (B) (Left) Plaque assay showing loss of phage defense activity for all SL mutants from (A). (Right) Bar graph quantifying the reduction in efficiency of plating (EOP) relative to an empty vector (EV) control. Data are mean ± SD (n = 3 technical replicates). (C) RIP-seq and cDIP-seq coverage tracks for the indicated SL mutants alongside input controls, revealing a range of defects in ither RNA binding, cDNA synthesis and binding, or both. (D) (Top) Schematic of terminal portions of cDIP-seq reads (light gray) failing to align to the cDNA reference, resulting in soft clipping and exclusion from coverage plots. A donut plot reporting the proportion of cDNA-mapping reads with the indicated lengths of 3′-clipped sequences is shown at left for *Kpn*DRT2 WT cDIP-seq. (Bottom) Mapping of 3′-soft-clipped sequences from cDIP-seq experiments back to the *Kpn*DRT2 locus, demonstrating that they derive from the cDNA 5′ end. SL2^MUT exhibits an aberrant pattern relative to that of wild type. The consistent ~30-nt length of the remapped sequences represents the expected overhang from alignment of 150-nt sequencing reads to a ~120-nt cDNA locus. For (C) and (D), coordinates are numbered from the beginning of the *K. pneumoniae*–derived sequence on the expression plasmid. (E) (Top) Schematic of sequencing reads that map across the cDNA repeat–repeat junction. (Bottom) Bar graph quantifying the abundance of junction-spanning reads from sequencing of total DNA in the indicated conditions. Red and pink indicate top and bottom strands, respectively; data are mean ± SD (n = 3 biological replicates). (F) (Top) Schematic of long-read Nanopore sequencing workflow with DNA from phage-infected cells expressing WT *Kpn*DRT2. (Bottom) Nanopore read coverage over a reference sequence containing concatenated repeats of the *Kpn*DRT2 cDNA sequence. For (C), (E), and (F), data are normalized for sequencing depth and plotted as CPM. (G) Inferred mechanism of RCRT mediated by sequence and structural features of SL2. After (Top) synthesis of 5′-TGT-3′ templated by ACA-1 at the end of one cDNA repeat, (middle) the nascent DNA strand dissociates from its template and reanneals with the complementary ACA-2 after SL2 melting. (Bottom) Template jumping initiates a subsequent round of reverse transcription, with concatenation of one cDNA repeat to the next and incorporation of one additional base at the repeat junction, ultimately leading to long concatemeric cDNA (ccDNA) products.

**Fig. 3.. The concatemeric cDNA product contains a nearly endless ORF (*neo*).**
(A) RNA-seq coverage over the *Kpn*DRT2 ncRNA locus from cells in the absence or presence of phage T5. (B) Model showing the consecutive production of ncRNA, concatemeric cDNA, and concatemeric RNA, all encoded by the *Kpn*DRT2 locus. Dashed lines indicate repeat–repeat junctions resulting from RCRT. (Inset) The consensus promoter formed across each junction. (C) (Top) Bar graph quantifying relative concatemeric RNA abundance in a phage infection time course experiment by using RT-qPCR with repeat junction primers. (Bottom) Northern blot of concatemeric RNA using a junction-spanning probe (bottom). RT-qPCR data are normalized to WT uninfected cells (t = 0); data are mean ± SD (n =3 biological replicates). (D) Putative open reading frame (ORF) encoded by concatemeric RNA. The cDNA synthesis start site and predicted start codon are indicated (pink and blue arrows, respectively), and the predicted RBS is shaded in beige. A leucine codon spans the repeat–repeat junction. (E) Schematic of the cDNA template region (pink), with the predicted start codon and experimentally tested mutations indicated. (F) Plaque assay showing that phage defense activity is eliminated with a single–base pair substitution that introduces an in-frame stop codon but is only modestly affected by synonymous or missense mutations. EV, empty vector. (G) Bar graph quantifying phage defense activity for insertions within SL3, SL4, or SL5 of the indicated length. Reduction in EOP is calculated relative to an EV control; data are mean ± SD (n = 3 technical replicates). The only mutants that retain phage defense activity have insertion lengths of a multiple of 3 bp.

**Fig. 4.. Neo proteins induce programmed cellular dormancy.**
(A) Schematic of experimental approach to detect Neo in phage-infected cells by means of LC-MS/MS. (B) Bar graph quantifying Neo protein abundance from cells tested in the indicated conditions. Data are mean ± SD (n = 3 biological replicates). (C) Abundance of RT and Neo proteins relative to the *E. coli* proteome in phage-infected cells expressing WT *Kpn*DRT2. (D) Differential protein abundance in T5-infected cells expressing *Kpn*DRT2 WT or YCAA. Phage proteins are colored in brown, and ArfA and RMF are colored in red and labeled. All other differentially abundant proteins (fold change > 2 and FDR < 0.05) are colored in dark blue. (E) Schematic of alternative ribosome rescue pathway mediated by ArfA, which (right) would release Neo proteins from ribosomes stalled on nonstop *neo* mRNAs without targeting them for degradation, unlike (left) the tmRNA pathway. (F) Growth curves of strains transformed with EV or the WT *Kpn*DRT2 system, ± T5 phage at the indicated MOI. Shaded regions indicate the SD across independent biological replicates (n = 3). (G) Schematic of cloning and inducible expression strategy to monitor the physiological effects of Neo polypeptides of variable repeat length. (H) Growth curves of strains transformed with WT or scrambled Neo sequences of the indicated repeat lengths, alongside an EV control. The dashed line indicates the point of induction with arabinose (0.5%) and theophylline (0.5 mM). Shaded regions indicate the SD across independent biological replicates (n = 3).

**Fig. 5.. Concatemeric neo genes and programmed dormancy are a broadly conserved phage defense mechanism.**
(A) Schematic for the automated detection of putative Neo proteins in homologous DRT2 operons. (B) Phylogenetic tree of DRT2 homologs, with outer rings showing the widespread presence of RT-associated ncRNAs and putative Neo proteins. Homologs selected for experimental testing are indicated with pink circles. (C) Multiple sequence alignment (MSA) and secondary structure prediction of Neo proteins identified in (B). A single Neo repeat is shown for all homologs; shading indicates amino acid conservation. (D) AlphaFold prediction of a three-repeat Neo polypeptide, showing the sites of proline mutagenesis tested in (E). Prolines were inserted C-terminal to the indicated residues within each of three concatenated repeats. (E) Growth curves of strains transformed with three-repeat Neo constructs containing the indicated proline insertions, alongside an EV control. The dashed line indicates the point of induction with arabinose (0.5%) and theophylline (0.5 mM). Shaded regions indicate the SD across independent biological replicates (n = 3). (F) Heat map showing the distribution of ccDNA repeat lengths in cells expressing the indicated DRT2 homologs. Data are plotted as log₁₀(CPM) from Nanopore sequencing of total DNA. (G) Heat map showing the growth rates of cells expressing Neo homologs with the indicated repeat lengths. Growth rates are normalized to an EV control and represent the mean of independent biological replicates (n = 3). Empty cells with X indicate Neo expression constructs that could not be successfully cloned, presumably because of toxicity. (H) Model for the antiphage defense mechanism of DRT2 systems. RT enzymes bind the scaffold portion of associated ncRNAs and constitutively produce concatemeric cDNA through RCRT. Phage infection triggers second-strand synthesis, yielding a dsDNA molecule that is transcribed into stop codon–less *neo* mRNA. Translation produces Neo proteins that potently arrest cell growth, protecting the larger bacterial population from the spread of phage.

See this image and copyright information in PMC

Update of

De novo gene synthesis by an antiviral reverse transcriptase.
Tang S, Conte V, Zhang DJ, Žedaveinytė R, Lampe GD, Wiegand T, Tang LC, Wang M, Walker MWG, George JT, Berchowitz LE, Jovanovic M, Sternberg SH. Tang S, et al. bioRxiv [Preprint]. 2024 May 8:2024.05.08.593200. doi: 10.1101/2024.05.08.593200. bioRxiv. 2024. Update in: Science. 2024 Oct 4;386(6717):eadq0876. doi: 10.1126/science.adq0876. PMID: 38766058 Free PMC article. Updated. Preprint.

References

1. Frost LS, Leplae R, Summers AO, Toussaint A, Mobile genetic elements: The agents of open source evolution. Nat. Rev. Microbiol 3, 722–732 (2005). doi: 10.1038/nrmicro1235; - DOI - PubMed
1. Aziz RK, Breitbart M, Edwards RA, Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Res 38, 4207–4217 (2010). doi: 10.1093/nar/gkq140; - DOI - PMC - PubMed
1. Canapa A, Barucca M, Biscotti MA, Forconi M, Olmo E, Transposons, genome size, and evolutionary insights in animals. Cytogenet. Genome Res 147, 217–239 (2015). doi: 10.1159/000444429; - DOI - PubMed
1. Koonin EV, Krupovic M, Evolution of adaptive immunity from transposable elements combined with innate immune systems. Nat. Rev. Genet 16, 184–192 (2015). doi: 10.1038/nrg3859; - DOI - PMC - PubMed
1. Koonin EV, Makarova KS, Wolf YI, Krupovic M, Evolutionary entanglement of mobile genetic elements and host defence systems: Guns for hire. Nat. Rev. Genet 21, 119–131 (2020). doi: 10.1038/s41576-019-0172-9; - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

De novo gene synthesis by an antiviral reverse transcriptase

Affiliations

De novo gene synthesis by an antiviral reverse transcriptase

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous