Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 5;115(23):E5307-E5316.
doi: 10.1073/pnas.1803440115. Epub 2018 May 21.

Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis

Affiliations

Systematic prediction of genes functionally linked to CRISPR-Cas systems by gene neighborhood analysis

Sergey A Shmakov et al. Proc Natl Acad Sci U S A. .

Abstract

The CRISPR-Cas systems of bacterial and archaeal adaptive immunity consist of direct repeat arrays separated by unique spacers and multiple CRISPR-associated (cas) genes encoding proteins that mediate all stages of the CRISPR response. In addition to the relatively small set of core cas genes that are typically present in all CRISPR-Cas systems of a given (sub)type and are essential for the defense function, numerous genes occur in CRISPR-cas loci only sporadically. Some of these have been shown to perform various ancillary roles in CRISPR response, but the functional relevance of most remains unknown. We developed a computational strategy for systematically detecting genes that are likely to be functionally linked to CRISPR-Cas. The approach is based on a "CRISPRicity" metric that measures the strength of CRISPR association for all protein-coding genes from sequenced bacterial and archaeal genomes. Uncharacterized genes with CRISPRicity values comparable to those of cas genes are considered candidate CRISPR-linked genes. We describe additional criteria to predict functionally relevance for genes in the candidate set and identify 79 genes as strong candidates for functional association with CRISPR-Cas systems. A substantial majority of these CRISPR-linked genes reside in type III CRISPR-cas loci, which implies exceptional functional versatility of type III systems. Numerous candidate CRISPR-linked genes encode integral membrane proteins suggestive of tight membrane association of CRISPR-Cas systems, whereas many others encode proteins implicated in various signal transduction pathways. These predictions provide ample material for improving annotation of CRISPR-cas loci and experimental characterization of previously unsuspected aspects of CRISPR-Cas system functionality.

Keywords: CRISPR-Cas; computational genomics; gene neighborhoods; membrane proteins; signaling.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The computational pipeline for the analysis of the CRISPR-linked gene space.
Fig. 2.
Fig. 2.
Protein clusters in CRISPR-cas neighborhoods. (A) Distribution of voxels in the CRISPRicity-abundance-distance space. Red circles: probability mass distribution for the union of cas and previously identified CRISPR-associated genes, voxels with CRISPR index I > 2; blue crosses: probability mass distribution for the union of cas and previously identified CRISPR-associated genes, voxels with CRISPR index I < 2; green diamonds: probability mass distribution for unknown genes, voxels with CRISPR index I > 2 (candidate CRISPR-linked genes); purple crosses: probability mass distribution for unknown genes, voxels with CRISPR index I < 2. (B) Breakdown of previously undetected CRISPR-linked protein clusters by the CRISPR-Cas types and subtypes.
Fig. 3.
Fig. 3.
Locus organization of type III CRISPR-Cas systems containing predicted CRISPR-linked genes encoding membrane proteins. (A) CorA, divalent cation membrane channel encoded in type III-B CRISPR-cas loci along with two distinct nucleases. (B) Membrane-associated CARF domain-containing proteins. (C) Uncharacterized membrane protein family in diverse type III loci. For each locus, species name, genome accession number, and the respective nucleotide coordinates and CRISPR-Cas system subtype are indicated. The genes in a representative locus are shown by block arrows, which show the transcription direction. The scale of an arrow is roughly proportional to the respective gene length. Homologous genes and domains are color-coded; empty arrows show predicted genes without detectable homologs. On the Right, models of the membrane topology of the predicted CRISPR-linked membrane proteins protein are shown according to the TMHMM predictions. Hypothetical interactions of the identified CRISPR-linked proteins with CRISPR-Cas system components are also depicted (see Predicted CRISPR-Linked Proteins: Membrane Connections and Signal Transduction). The cas gene names follow the current nomenclature (11); for several core cas genes, an extension specifies the gene group (gr5, gr6, gr7, groups 5, 6, and 7 of the RAMP superfamily, respectively; gr8, large subunit of the effector complex; gr11, small subunit of the effector complex). Abbreviations and other gene names: CARF, CRISPR-associated Rossmann fold domain; COG5421, transposase of COG5421 family; DHH, DHH family nuclease; Lon, Lon family protease; NYN, NYN family nuclease; RT, reverse transcriptase; TM, transmembrane helix.
Fig. 4.
Fig. 4.
Locus organization of type I-E and type IV CRISPR-Cas systems containing predicted CRISPR-linked genes. (A) STAND family NTPases encoded in minimal type I-E loci. The clade of STAND NTPases associated with type I-E systems is shown on the Right (complete tree is available at ftp://ftp.ncbi.nlm.nih.gov/pub/wolf/_suppl/CRISPRicity). Genomes in which the STAND NTPases gene is linked to a type I-E locus are shown in blue, and genomes in which there is no such link are shown in black. Colored branches denote three subfamilies (clusters) identified in this work. Support values greater than 70% are indicated for the respective branches. (B) CysH family PAPS reductases encoded in type IV-B loci. The clade of CysH family enzymes associated with type IV CRISPR-Cas systems is shown on the Right (complete tree is available at ftp://ftp.ncbi.nlm.nih.gov/pub/wolf/_suppl/CRISPRicity). Genomes in which cysH-like genes are linked to type IV-B loci are shown in blue, and genomes in which there is no such link are shown in black. Colored branches denote three subfamilies (clusters) identified in this work. Support values greater than 70% are indicated for the respective branches. The designations are as in Fig. 3. CRISPR arrays are shown by gray boxes. Additional abbreviations and gene names: ADP-PRT, ADP phosphoribosyltransferase; FlhG, MinD-like ATPase involved in chromosome partitioning or flagellar assembly; HTH, helix-turn-helix DNA-binding domain; LRP, LRP family transcriptional regulator; N6-MTase, N6 adenosine methylase; NB_ARC, STAND NTPase fused to TPR-repeats (distinct from the predicted CRISPR-linked STAND NTPase); SSB, single-stranded DNA-binding protein.
Fig. 5.
Fig. 5.
Coevolution of predicted CRISPR-linked genes with signature cas genes. The panels show plots of pairwise distances between predicted the CRISPR-linked corA gene product, Cas10 and 16S rRNA estimated from the respective phylogenetic trees. The Spearman rank correlation coefficient is indicated on each plot.

Similar articles

Cited by

References

    1. Sorek R, Lawrence CM, Wiedenheft B. CRISPR-mediated adaptive immune systems in bacteria and archaea. Annu Rev Biochem. 2013;82:237–266. - PubMed
    1. Wright AV, Nuñez JK, Doudna JA. Biology and applications of CRISPR systems: Harnessing nature’s toolbox for genome engineering. Cell. 2016;164:29–44. - PubMed
    1. Komor AC, Badran AH, Liu DR. CRISPR-based technologies for the manipulation of eukaryotic genomes. Cell. 2017;168:20–36. - PMC - PubMed
    1. Mohanraju P, et al. Diverse evolutionary roots and mechanistic variations of the CRISPR-Cas systems. Science. 2016;353:aad5147. - PubMed
    1. Barrangou R, Horvath P. A decade of discovery: CRISPR functions and applications. Nat Microbiol. 2017;2:17092. - PubMed

Publication types

Substances

LinkOut - more resources