Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan 20:8:18.
doi: 10.1186/1471-2105-8-18.

PILER-CR: fast and accurate identification of CRISPR repeats

Affiliations

PILER-CR: fast and accurate identification of CRISPR repeats

Robert C Edgar. BMC Bioinformatics. .

Abstract

Background: Sequencing of prokaryotic genomes has recently revealed the presence of CRISPR elements: short, highly conserved repeats separated by unique sequences of similar length. The distinctive sequence signature of CRISPR repeats can be found using general-purpose repeat- or pattern-finding software tools. However, the output of such tools is not always ideal for studying these repeats, and significant effort is sometimes needed to build additional tools and perform manual analysis of the output.

Results: We present PILER-CR, a program specifically designed for the identification and analysis of CRISPR repeats. The program executes rapidly, completing a 5 Mb genome in around 5 seconds on a current desktop computer. We validate the algorithm by manual curation and by comparison with published surveys of these repeats, finding that PILER-CR has both high sensitivity and high specificity. We also present a catalogue of putative CRISPR repeats identified in a comprehensive analysis of 346 prokaryotic genomes.

Conclusion: PILER-CR is a useful tool for rapid identification and classification of CRISPR repeats. The software is donated to the public domain. Source code and a Linux binary are freely available at http://www.drive5.com/pilercr.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Structure of a CRISPR array. CRISPR repeats are perfectly (or almost perfectly) conserved short sequences, typically of length 20 to 40 bases, separated by unique sequences known as spacers. The spacer length in a given array is sometimes approximately conserved, varying by a few bases, and sometimes exactly conserved. The spacer length is typically similar to the repeat length.
Figure 2
Figure 2
Dot-plot of a CRISPR array against itself. Self-similarity plot ("dot-plot") of a genome against itself in a CRISPR array region. The main diagonal is shown as a dashed line. As the two axes represent the same sequence, local alignments (diagonal lines) are symmetrical about the main diagonal.
Figure 3
Figure 3
Pile construction. When local alignments are projected onto the genome, "piles" are produced. A pile is a contiguous sequence of bases, each one of which has a hit to at least one other region in the genome. Bases that are not in a pile are unique sequence. Each local alignment connects two piles. In this figure, each hit has a different color so, for example, the purple hit connects the first and second pile.
Figure 4
Figure 4
A chain of hits meeting CRISPR criteria. CRISPR arrays are identified by following chains of hits. Starting with a given pile, each hit that connects this pile to another pile later in the genome is a potential link in the chain. All possible chains are explored, abandoning the search each time the chain violates the criteria used for CRISPR array recognition (see Table 2). These criteria include maximum and minimum repeat length, maximum and minimum spacer length, and measures of the variance in repeat and spacer lengths. Shorter links are explored before longer links as regularly spaced arrays will be obtained by skipping every second, third... repeat. The figure shows the correct chain (arrows) for the example array from Fig. 3.

Similar articles

Cited by

References

    1. Jansen R, van Embden JD, Gaastra W, Schouls LM. Identification of a novel family of sequence repeats among prokaryotes. Omics. 2002;6:23–33. doi: 10.1089/15362310252780816. - DOI - PubMed
    1. Mojica FJ, Diez-Villasenor C, Soria E, Juez G. Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Mol Microbiol. 2000;36:244–246. doi: 10.1046/j.1365-2958.2000.01838.x. - DOI - PubMed
    1. Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, Kerlavage AR, Dougherty BA, Tomb JF, Adams MD, Reich CI, Overbeek R, Kirkness EF, Weinstock KG, Merrick JM, Glodek A, Scott JL, Geoghagen NS, Venter JC. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996;273:1058–1073. doi: 10.1126/science.273.5278.1058. - DOI - PubMed
    1. DeBoy RT, Mongodin EF, Emerson JB, Nelson KE. Chromosome evolution in the Thermotogales: large-scale inversions and strain diversification of CRISPR sequences. J Bacteriol. 2006;188:2364–2374. doi: 10.1128/JB.188.7.2364-2374.2006. - DOI - PMC - PubMed
    1. Haft DH, Selengut J, Mongodin EF, Nelson KE. A Guild of 45 CRISPR-Associated (Cas) Protein Families and Multiple CRISPR/Cas Subtypes Exist in Prokaryotic Genomes. PLoS Comput Biol. 2005;1:e60. doi: 10.1371/journal.pcbi.0010060. - DOI - PMC - PubMed

Publication types