Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011 Jul 14:6:38.
doi: 10.1186/1745-6150-6-38.

Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems

Affiliations
Comparative Study

Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems

Kira S Makarova et al. Biol Direct. .

Abstract

Background: The CRISPR-Cas adaptive immunity systems that are present in most Archaea and many Bacteria function by incorporating fragments of alien genomes into specific genomic loci, transcribing the inserts and using the transcripts as guide RNAs to destroy the genome of the cognate virus or plasmid. This RNA interference-like immune response is mediated by numerous, diverse and rapidly evolving Cas (CRISPR-associated) proteins, several of which form the Cascade complex involved in the processing of CRISPR transcripts and cleavage of the target DNA. Comparative analysis of the Cas protein sequences and structures led to the classification of the CRISPR-Cas systems into three Types (I, II and III).

Results: A detailed comparison of the available sequences and structures of Cas proteins revealed several unnoticed homologous relationships. The Repeat-Associated Mysterious Proteins (RAMPs) containing a distinct form of the RNA Recognition Motif (RRM) domain, which are major components of the CRISPR-Cas systems, were classified into three large groups, Cas5, Cas6 and Cas7. Each of these groups includes many previously uncharacterized proteins now shown to adopt the RAMP structure. Evidence is presented that large subunits contained in most of the CRISPR-Cas systems could be homologous to Cas10 proteins which contain a polymerase-like Palm domain and are predicted to be enzymatically active in Type III CRISPR-Cas systems but inactivated in Type I systems. These findings, the fact that the CRISPR polymerases, RAMPs and Cas2 all contain core RRM domains, and distinct gene arrangements in the three types of CRISPR-Cas systems together provide for a simple scenario for origin and evolution of the CRISPR-Cas machinery. Under this scenario, the CRISPR-Cas system originated in thermophilic Archaea and subsequently spread horizontally among prokaryotes.

Conclusions: Because of the extreme diversity of CRISPR-Cas systems, in-depth sequence and structure comparison continue to reveal unexpected homologous relationship among Cas proteins. Unification of Cas protein families previously considered unrelated provides for improvement in the classification of CRISPR-Cas systems and a reconstruction of their evolution.

Open peer review: This article was reviewed by Malcolm White (nominated by Purficacion Lopez-Garcia), Frank Eisenhaber and Igor Zhulin. For the full reviews, see the Reviewers' Comments section.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Multiple alignment of Cas7 subfamilies and related families of RAMPs. The multiple sequence alignment includes the conserved blocks identified by HHpred (red box), secondary structure predictions and the secondary structure elements extracted from the crystal structure of the Cas7 from S. solfataricus [16]. Secondary structure prediction showed as follows: 'H' indicates α-helix, 'E' indicates extended conformation (β-strand). The sequences are denoted by their GI numbers and species names. G-rich loop region of RAMPs is shown by blue box. The positions of the first and the last residues of the aligned region in the corresponding protein are indicated for each sequence. The numbers within the alignment represent poorly conserved inserts that are not shown. The coloring is based on the consensus shown underneath the alignment; 'h' indicates hydrophobic residues (WFYMLIVACTH), 'p' indicates polar residues (EDKRNQHTS), 's' indicates small residues (ACDGNPSTV).
Figure 2
Figure 2
The RRM fold of RAMPs and Cas2. The RRM fold domains of Cas2 and the three major RAMP groups proposed in the text are shown in cartoon representation with their N- and C- termini indicated. In Cas7, the insertions into the core of the RRM fold are shown in a darker shade. In the RAMPs with two RRM fold domains, these are respectively labeled as N(-terminal) and C(-terminal). The distinct C-terminal domains of Cas5 and Cas6f (Csy4) are also shown. In Cas6f, the glycine-rich loop, which is embedded in a beta-hairpin in contrast to the typical helix-strand element, is colored orange. Note the "horizontal" packing of the first helix of the core RRM fold against the 4 strand sheet, which is one of the characteristic structural features of the RAMPs (apparent in Cas7, Cas6, Cas6e and Cas5). The following PDB ids were used to generate these representations: 2I0X (Cas2);_3PS0 (Cas7); 3I4H (Cas6); 1WJ9 (Cas6e/CasE); 3KG4(Cas5); 2XLJ (Cas6f/Csy4).
Figure 3
Figure 3
Classification of the RAMPs. The tree-like scheme of RAMP relationships is based on the sequence similarity, structural features and neighborhood analysis described in the text, and should not be construed as a phylogenetic tree. Unresolved relationships are shown as multifurcations and tentative assignments are shown by broken lines. The catalytic activity of some of the RAMP proteins of the Cas5 and Cas7 groups involving the partially conserved histidines shown in the figure should be considered a tentative prediction.
Figure 4
Figure 4
Gene content similarity between type I-E and type III-A systems and structural organization of large subunits of different CRISPR-Cas systems of type I and III. A. Genes in the operons for I-E and III-A subtypes are shown by arrows with size roughly proportion to the size of the corresponding gene. Homologous genes are shown by the arrows of same color or hashing. RAMPs are shown by pink or pink hashing. Solid lines connect genes for which homology can be confidently demonstrated, and dashed lines connect genes for which homology is inferred tentatively. The Cascade complex subunits are shown by square brackets. Two previously published domain annotations are included for comparison. B. Domain organization of large subunits of different type I and III CRISPR-Cas systems. Domain size is roughly proportional to correspondent sequence length. The letter "S" marks the regions that could be homologous to small subunits of Cascade complex encoded as separated genes in Type III systems, I-E subtype and some systems of I-A subtype.
Figure 5
Figure 5
Structural organization of Cas9 protein families and their homologs. Homologous regions are shown by the same color. Distinct sequence motifs are denoted by the corresponding conserved amino acid residues above the respective domains (when the same conserved amino acid occurs in different motifs, one is marked by an asterisk to avoid confusion).
Figure 6
Figure 6
Unusual CRISPR-Cas systems. A. Type I-C-variants with GSU0054 (or GSU0053) signature gene. B. Type I-F-variant. C. Type III-variant.
Figure 7
Figure 7
Evolutionary scenario for the origin of CRISPR-Cas systems. Homologous genes are color-coded and identified by a family name (names follow the classification from [20]). Names in bold are proposed systematic names including those propose in this work; "legacy names" are in regular font. The signature genes for CRISPR-Cas types are shown within green boxes, and for subtypes within red boxes. The bold letters above the genes show major categories of Cas proteins: L, large CASCADE subunit; S, small CASCADE subunit; R, RAMP CASCADE subunit; RE, RAMP family RNase involved in crRNA processing (experimentally characterized nucleases shown be asterisks); T, transcriptional regulator. Genes coding for inactivated (putative) polymerases are indicated by crosses. Major evolutionary events are shown in the corresponding branches. Broken lines denote alternative evolutionary scenarios for the origin of RAMPs.

References

    1. Jansen R, Embden JD, Gaastra W, Schouls LM. Identification of genes that are associated with DNA repeats in prokaryotes. Mol Microbiol. 2002;43(6):1565–1575. doi: 10.1046/j.1365-2958.2002.02839.x. - DOI - PubMed
    1. Makarova KS, Aravind L, Grishin NV, Rogozin IB, Koonin EV. A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res. 2002;30(2):482–496. doi: 10.1093/nar/30.2.482. - DOI - PMC - PubMed
    1. Mojica FJ, Diez-Villasenor C, Garcia-Martinez J, Soria E. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol. 2005;60(2):174–182. doi: 10.1007/s00239-004-0046-3. - DOI - PubMed
    1. Bolotin A, Quinquis B, Sorokin A, Ehrlich SD. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology. 2005;151(Pt 8):2551–2561. - PubMed
    1. Makarova KS, Grishin NV, Shabalina SA, Wolf YI, Koonin EV. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol Direct. 2006;1:7. doi: 10.1186/1745-6150-1-7. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources