Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 1;36(7):2001-2008.
doi: 10.1093/bioinformatics/btz867.

CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing

Affiliations

CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing

Samuele Cancellieri et al. Bioinformatics. .

Abstract

Motivation: Clustered regularly interspaced short palindromic repeats (CRISPR) technologies allow for facile genomic modification in a site-specific manner. A key step in this process is the in silico design of single guide RNAs to efficiently and specifically target a site of interest. To this end, it is necessary to enumerate all potential off-target sites within a given genome that could be inadvertently altered by nuclease-mediated cleavage. Currently available software for this task is limited by computational efficiency, variant support or annotation, and assessment of the functional impact of potential off-target effects.

Results: To overcome these limitations, we have developed CRISPRitz, a suite of software tools to support the design and analysis of CRISPR/CRISPR-associated (Cas) experiments. Using efficient data structures combined with parallel computation, we offer a rapid, reliable, and exhaustive search mechanism to enumerate a comprehensive list of putative off-target sites. As proof-of-principle, we performed a head-to-head comparison with other available tools on several datasets. This analysis highlighted the unique features and superior computational performance of CRISPRitz including support for genomic searching with DNA/RNA bulges and mismatches of arbitrary size as specified by the user as well as consideration of genetic variants (variant-aware). In addition, graphical reports are offered for coding and non-coding regions that annotate the potential impact of putative off-target sites that lie within regions of functional genomic annotation (e.g. insulator and chromatin accessible sites from the ENCyclopedia Of DNA Elements [ENCODE] project).

Availability and implementation: The software is freely available at: https://github.com/pinellolab/CRISPRitzhttps://github.com/InfOmics/CRISPRitz.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of CRISPRitz. Starting from a reference genome and a set of genetic variants, the add-variants tool builds a new reference genome that incorporates population or personal variants (see Section 2.3). To perform searches with bulges in addition to mismatches it is necessary to create an index for the reference genome through the index-genome tool. This tool scans the genome and collects all the candidate targets for any PAM sequence given in input. The output is a compressed representation of candidate targets found on chromosomes (see Section 2.4). Targets and off-targets are found by the search tool, which takes in input the reference genome file or the previously created genome index with variants and a list of sgRNAs, the mismatches threshold [mandatory], and the bulges threshold [optional]. To understand the functional impact of the ongoing CRISPR experiment on a genome, starting from an input file of functional annotations in BED format, the annotate-results tool lists the number of guide matches that fall in exons, introns, promoters, CTCF and DNase I regions on the genome (see Section 2.6). Finally, generate-report (see Section 2.6) implements a graphical visualization through radar charts and motif logos of any guide behavior in a specific condition (i.e. number of mismatches and/or bulges)
Fig. 2.
Fig. 2.
The PAM search. Searching for the PAM NGG in the genome starts by matching the base A at position 0 in the genome with the root children T, G, C and A of the pattern matching machine. This example illustrates the first 11 transitions of the automata that correspond to the identification of three candidate targets (0, 5 and 6). For more details see Supplementary File S1 Section S2
Fig. 3.
Fig. 3.
The mismatch-aware guide matching strategy. Characters are encoded using four-bit notation, as illustrated in the table on the left. For a given genomic position, this step compares the characters of the guide with the characters of the genome using bitwise operations. In the example the guide matching starts from index 24
Fig. 4.
Fig. 4.
Example of guide matching by considering up to one mismatch and up to one RNA/DNA bulge. The search starts by visiting the left-most path of the tree, which represents the candidate sequence CT2. After the second mismatch (T versus C), the algorithm verifies that no bulges are allowed and stops the visit over the CT2 path. Back to the previous branch G, which in the example represents the first mismatch, it continues over the CT1 path. It verifies that the second mismatch cannot be considered as a DNA bulge but it can be considered as RNA bulge. It concludes the CT1 path visit to the leaf, thus identifying CT1 as an off-target with one mismatch and one RNA bulge. Similarly, the algorithm jumps back to the previous branch A and reaches the CT3 leaf, thus identifying CT3 as an off-target with one mismatch and one DNA bulge
Fig. 5.
Fig. 5.
Running time comparison between CRISPRitz, Cas-OFFinder, FlashFry and OFF-Spotter. (A) Performance by varying the number of analyzed guides with a mismatch threshold set to 5. (B) Running time to search for 1000 guides with an increasing mismatch threshold. (C) Performance by varying the number of analyzed guides with thresholds set to 5, 1, 1, for number of mismatches, DNA bulges and RNA bulges threshold, respectively. (D) Running time (in log scale) for searching 1000 guides with an increasing number of mismatches and a fixed number of DNA and RNA bulges (1, 1)
Fig. 6.
Fig. 6.
Visual representation of CRISPRitz results. (A) and (B) show the behavior of two guides from the CCR5 set (hg19 reference genome, with up to 4 mismatches and no bulges). (A) and (B) were created by comparing results from the CCR5 dataset with the previous computed results based on the Gecko Library v2, as explained in Section 2.6. The area shown in radar chart is constructed joining the points on the ‘y’ axis of each annotation. This area helps the user to obtain an overview on the behavior of a single guide of the analyzed dataset as compared with a reference set of guides. A small area represents a guide with a poor activity in terms of off-targets; on the contrary, a big area represents a guide with a rich activity. (C) Bar plot to show the relative increase in the count of off-targets when accounting for genetic variants from the 1000 Genomes project

References

    1. Aho A.V., Corasick M.J. (1975) Efficient string matching: an aid to bibliographic search. Commun. ACM, 18, 333–340.
    1. Bae S. et al. (2014) Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics, 30, 1473–1475. - PMC - PubMed
    1. Bentley J., Sedgewick B. (1998) Ternary search trees. Dr. Dobb’s J., 23,
    1. Canver M. et al. (2017) Variant-aware saturating mutagenesis using multiple Cas9 nucleases identifies regulatory elements at trait-associated loci. Nat. Genet., 49, 625–634. - PMC - PubMed
    1. Canver M.C. et al. (2018) Impact of genetic variation on CRISPR-Cas targeting. CRISPR J., 1, 159–170. - PMC - PubMed

Publication types

MeSH terms

Substances