Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 17:17:356.
doi: 10.1186/s12864-016-2627-0.

CRISPRDetect: A flexible algorithm to define CRISPR arrays

Affiliations

CRISPRDetect: A flexible algorithm to define CRISPR arrays

Ambarish Biswas et al. BMC Genomics. .

Abstract

Background: CRISPR (clustered regularly interspaced short palindromic repeats) RNAs provide the specificity for noncoding RNA-guided adaptive immune defence systems in prokaryotes. CRISPR arrays consist of repeat sequences separated by specific spacer sequences. CRISPR arrays have previously been identified in a large proportion of prokaryotic genomes. However, currently available detection algorithms do not utilise recently discovered features regarding CRISPR loci.

Results: We have developed a new approach to automatically detect, predict and interactively refine CRISPR arrays. It is available as a web program and command line from bioanalysis.otago.ac.nz/CRISPRDetect. CRISPRDetect discovers putative arrays, extends the array by detecting additional variant repeats, corrects the direction of arrays, refines the repeat/spacer boundaries, and annotates different types of sequence variations (e.g. insertion/deletion) in near identical repeats. Due to these features, CRISPRDetect has significant advantages when compared to existing identification tools. As well as further support for small medium and large repeats, CRISPRDetect identified a class of arrays with 'extra-large' repeats in bacteria (repeats 44-50 nt). The CRISPRDetect output is integrated with other analysis tools. Notably, the predicted spacers can be directly utilised by CRISPRTarget to predict targets.

Conclusion: CRISPRDetect enables more accurate detection of arrays and spacers and its gff output is suitable for inclusion in genome annotation pipelines and visualisation. It has been used to analyse all complete bacterial and archaeal reference genomes.

Keywords: Bioinformatics; CRISPR; Cas; Horizontal gene transfer; Phage resistance; Plasmids; Repeat elements; Small RNA targets; crRNA.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The CRISPRDetect automated pipeline. The modules that make up the pipeline are shown. In some cases there is an iterative repeat of processes, (iteration ‘0’ to i). See CRISPRDetect.pl for details. The interactive web implementation allows dynamic alteration of the parameters to suit the particular CRISPR array and genome
Fig. 2
Fig. 2
CRISPRDetect predictions for E. coli K-12- text output. CRISPRDetect identifies two CRISPR arrays in a K-12 genome, corresponding to the well characterised CRISPR 2.1 and 2.3 loci. This genome is provided as one of the test sets at http://bioanalysis.otago.ac.nz/CRISPRDetect/. CRISPRDetect output. E. coli arrays - both arrays are reverse-complemented in the CRISPRDetect prediction (based on matches to reference repeat and other features by CRISPRDirection) a CRISPR 2.1 The array section of the CRISPRDetect output is shown, showing base differences e.g. a. TT mutations in the repeat toward the predicted 3’ end. b. The full output is shown, and specific features are in bold. For CRISPR 2.3 the reference repeat match also permitted inclusion of the experimentally verified last base (G) in the repeat, although it varies in two of six repeats (the first and last, bold). The score is high (8.14) and the components are shown below. The Directional analysis gives a ‘HIGH’ confidence for the reverse orientation as shown. The cas genes identified in the ‘.gbk’ file are listed as are the signature genes for any family present (only I-E in this example). c. CRISPRFinder prediction for E. coli CRISPR 2.3 for comparison. Prediction obtained from CRISPRdb predicted by CRISPRFinder
Fig. 3
Fig. 3
CRISPRDetect web output. An example of a predicted and automatically refined array from Cronobacter sakazakii ES15, which has 16 repeats, the last of which has degenerated. Options A-I are available for further interactive application of the selected processes to the selected array (Array 2 from this genome, array 1 is hidden). The array is shown in a standard format with substitutions in the repeat sequence shown. Insertions in one a repeat is indicated at the right. The quality score is high 8.87 (>4.0; max 13) and the score would be detailed in the next lines (as in Fig. 2, not shown). A link to CRISPRBank and initial analysis is shown in the top right and indicates that this exact repeat is found in five genomes (Cronobacter species). The annotation file in GFF can be downloaded for visualisation or further analysis (e.g. Fig. 6)
Fig. 4
Fig. 4
Comparison of the number of CRISPR arrays predicted by three existing methods compared with CRISPRDetect. Arrays with three or more repeats, and for CRISPRDetect a good quality score (>4.0) and ≥23 base repeat were counted
Fig. 5
Fig. 5
Sizes of CRISPR array repeats and spacers. a Distribution of sizes of the representative repeats for each array, the percentage of each length is shown separately for bacteria (blue) and archaea (yellow). Four size ranges- small, medium, large, and extra large are indicated. b Distribution of the median spacer size for each array. In (a) and (b) CRISPR arrays with ‘good’ scores (≥4.0) and three or more repeats from one strain for each species from Genbank/genomes were counted. For the same analysis including all strains, see Additional file 1: Figure S3
Fig. 6
Fig. 6
CRISPRDetect results on a genome browser. Genome feature format (gff) visualised in a genome browser (Artemis) [60]. This region has an array followed by an operon that includes some CRISPR associated genes. The figure shows a section of the RefSeq annotated version of Methanobrevibacter ruminantium genome [62]. The top line shows the annotation from the RefSeq file in GenBank (gbff) format. In the NCBI annotation pipeline the arrays are predicted by a combination of CRT and Piler-CR. These are annotated as a ‘repeat_region’s on the genome (light blue). The CRISPRDetect gff output file has been added to this annotation. Each repeat and spacer is shown in the indicated orientation

References

    1. Richter C, Chang JT, Fineran PC. Function and Regulation of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR Associated (Cas) Systems. Viruses. 2012;4(10):2291–2311. doi: 10.3390/v4102291. - DOI - PMC - PubMed
    1. Sorek R, Lawrence CM, Wiedenheft B. CRISPR-mediated adaptive immune systems in bacteria and archaea. Annu Rev Biochem. 2013;82:237–266. doi: 10.1146/annurev-biochem-072911-172315. - DOI - PubMed
    1. Westra ER, Swarts DC, Staals RH, Jore MM, Brouns SJ, van der Oost J. The CRISPRs, They Are A-Changin’: How Prokaryotes Generate Adaptive Immunity. Annu Rev Genet. 2012;46:311–339. doi: 10.1146/annurev-genet-110711-155447. - DOI - PubMed
    1. Samson JE, Magadan AH, Sabri M, Moineau S. Revenge of the phages: defeating bacterial defences. Nat Rev Microbiol. 2013;11(10):675–687. doi: 10.1038/nrmicro3096. - DOI - PubMed
    1. Louwen R, Staals RH, Endtz HP, van Baarlen P, van der Oost J. The Role of CRISPR-Cas Systems in Virulence of Pathogenic Bacteria. Microbiol Mol Biol Rev. 2014;78(1):74–88. doi: 10.1128/MMBR.00039-13. - DOI - PMC - PubMed

Publication types

Substances