Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct;16(10):1486-1493.
doi: 10.1080/15476286.2019.1639310. Epub 2019 Jul 11.

Multiple origins of reverse transcriptases linked to CRISPR-Cas systems

Affiliations

Multiple origins of reverse transcriptases linked to CRISPR-Cas systems

Nicolás Toro et al. RNA Biol. 2019 Oct.

Abstract

Prokaryotic genomes harbour a plethora of uncharacterized reverse transcriptases (RTs). RTs phylogenetically related to those encoded by group-II introns have been found associated with type III CRISPR-Cas systems, adjacent or fused at the C-terminus to Cas1. It is thought that these RTs may have a relevant function in the CRISPR immune response mediating spacer acquisition from RNA molecules. The origin and relationships of these RTs and the ways in which the various protein domains evolved remain matters of debate. We carried out a large survey of annotated RTs in databases (198,760 sequences) and constructed a large dataset of unique representative sequences (9,141). The combined phylogenetic reconstruction and identification of the RTs and their various protein domains in the vicinity of CRISPR adaptation and effector modules revealed three different origins for these RTs, consistent with their emergence on multiple occasions: a larger group that have evolved from group-II intron RTs, and two minor lineages that may have arisen more recently from Retron/retron-like sequences and Abi-P2 RTs, the latter associated with type I-C systems. We also identified a particular group of RTs associated with CRISPR-cas loci in clade 12, fused C-terminally to an archaeo-eukaryotic primase (AEP), a protein domain (AE-Prim_S_like) forming a particular family within the AEP proper clade. Together, these data provide new insight into the evolution of CRISPR-Cas/RT systems.

Keywords: Abi; CRISPR; Cas; archaeo-eukaryotic primase; diversity-generating retroelement; group II intron; retron; reverse transcriptase.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Compilation of RTs from databases and generation of the dataset. The procedure depicted yielded 9,141 predicted unique sequences representative of the current diversity of RTs in prokaryotes.
Figure 2.
Figure 2.
Phylogeny of prokaryotic RTs. The unrooted tree was constructed from an alignment of 9,141 unique predicted RT protein sequences obtained with the FastTree program. The corresponding RT protein sequences, accession number and species names are provided in Supplementary Table 1 and the tree newick file is provided as Supplementary File 1. The branches corresponding to group-II introns (GII), GII class F, Retron/retron-like, DGRs, CRISPR-Cas, G2L, Abi and UG RTs are indicated and highlighted with distinct colours. The numbers of the CRISPR-Cas encoded RT clades are indicated in brackets and the dots indicate the type of system with which they are associated: type III (black) or type I-C (blue). The red arrow indicates the branches corresponding to the putative RTs linked to type I-E CRISPR-Cas systems described by Silas et al. [16]. Relevant subtrees are provided in Figure 4 and Supplementaries Figures 2–4.
Figure 3.
Figure 3.
Architectures of genomic loci for the representative subtypes of CRISPR-Cas systems associated with RTs. Group-II intron-like RTs (ancient, clades 2–13; and recent, clade 1), Retron RT-like (clade 14) and Abi-P2 RT-like (clade 15). For each locus, the node number (Fusicatenibacter saccharivorans was not included in the 9,141 entries), species, respective nucleotide coordinates and CRISPR-Cas system subtype (derived from the respective effector genes) are indicated. Genes are shown roughly to scale; CRISPR arrays are indicated in brackets and are not to scale. The genes within each locus are denoted as in Supplementary Table 3. Homologous genes are colour-coded, with the exception of most of the ancillary genes, which are shown in white; unknown proteins are shown in grey.
Figure 4.
Figure 4.
Phylogeny of CRISPR-Cas encoded RTs. The identified lineages of CRISPR-Cas RTs, three evolving from group-II introns, one from Retron/retron-like and one from Abi-P2 RTs, are shown. The CRISPR-Cas RTs and neighbouring group-II intron classes (F, D and E); G2L; Retron and Abi-P2 clades are depicted schematically, with collapsed branches (FastTree support ≥0.85). For the CRISPR-Cas RT clades, the most common RT domains or gene organizations are indicated. Prim_S indicates an archaeo-eukaryotic primase AE_Prim_S-like domain.
Figure 5.
Figure 5.
Phylogeny of CRISPR-Cas encoded RT AE_Prim_S_like domains. The tree was constructed with FastTree, from an alignment of 62 protein sequences, including members of the AEP proper clade (AEP small_PriS proteins, NHEJ primases, Lef-1-like primases of baculoviruses and other related sequences), Prim-Pol clade (Z1568-like family, DR0530-like family, all3500-like family, bll5242-like family, ColE2 Rep-like family, RepE/RepS family), BT4734-like family, and the AE_Prim_S_like domain of 14 unique RT proteins with this architecture (NCBI database). All the clades except the all3500-like family (FastTree support 0.65) have a FastTree support ≥0.85. Other relevant FastTree support values are indicated. The corresponding tree newick file (Supplementary File 3) and the subalignment of the AE_Prim_S_like domain of 14 unique RT proteins with the three conserved motifs (Supplementary File 2) is provided in the Supplementary material.
Figure 6.
Figure 6.
Phylogeny of RTs linked to CRISPR-Cas systems type I-C. The tree was constructed with FastTree, from an alignment including the Abi-P2 RTs from bacterial species of the order Pasteurellales included in the 9,141 entries: Basfia succiniciproducens and Haemophilus haemolyticus strain HK386. Other close relatives identified by Blast searches of the NCBI database also associated with type I-C systems were included. The RTs linked to type I-C systems correspond to the red branches. The host and environment of the isolates are indicated. The tree newick file is provided (Supplementary File 4).

References

    1. Baltimore D. RNA-dependent DNA polymerase in virions of RNA tumour viruses. Nature. 1970;226:1209–1211. - PubMed
    1. Temin HM, Mizutani S. RNA-dependent DNA polymerase in virions of Rous sarcoma virus. Nature. 1970;226:1211–1213. - PubMed
    1. Lampson BC, Sun J, Hsu MY, et al. Reverse transcriptase in a clinical strain of Escherichia coli: production of branched RNA-linked msDNA. Science. 1989;243:1033–1038. - PubMed
    1. Lim D, Maas WK. Reverse transcriptase-dependent synthesis of a covalently linked, branched DNA-RNA compound in E. coli B. Cell. 1989;56:891–904. - PubMed
    1. Toro N, Jiménez-Zurdo JI, García-Rodríguez FM. Bacterial group II introns: not just splicing. FEMS Microbiol Rev. 2007;31(3):342–358. - PubMed

Publication types

Substances