Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 27;51(19):10147-10161.
doi: 10.1093/nar/gkad736.

Quantum biological insights into CRISPR-Cas9 sgRNA efficiency from explainable-AI driven feature engineering

Affiliations

Quantum biological insights into CRISPR-Cas9 sgRNA efficiency from explainable-AI driven feature engineering

Jaclyn M Noshay et al. Nucleic Acids Res. .

Abstract

CRISPR-Cas9 tools have transformed genetic manipulation capabilities in the laboratory. Empirical rules-of-thumb have been developed for only a narrow range of model organisms, and mechanistic underpinnings for sgRNA efficiency remain poorly understood. This work establishes a novel feature set and new public resource, produced with quantum chemical tensors, for interpreting and predicting sgRNA efficiency. Feature engineering for sgRNA efficiency is performed using an explainable-artificial intelligence model: iterative Random Forest (iRF). By encoding quantitative attributes of position-specific sequences for Escherichia coli sgRNAs, we identify important traits for sgRNA design in bacterial species. Additionally, we show that expanding positional encoding to quantum descriptors of base-pair, dimer, trimer, and tetramer sequences captures intricate interactions in local and neighboring nucleotides of the target DNA. These features highlight variation in CRISPR-Cas9 sgRNA dynamics between E. coli and H. sapiens genomes. These novel encodings of sgRNAs enhance our understanding of the elaborate quantum biological processes involved in CRISPR-Cas9 machinery.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Explainable-AI method for analysis of feature importance on prediction of sgRNA efficiency. Features are formatted to generate a wide matrix with rows representing each sgRNA, corresponding experimental cutting efficiency scores and columns for all feature values. This information matrix is analyzed with an iterative Random Forest (iRF) method.
Figure 2.
Figure 2.
Identifying model variation based on feature input and assessing feature importance in E. coli. (A) Violin plot of R2 values based on iRF model generation with isolated feature input (feature categories described in Table 1). (B) The top 50 features from the full feature matrix iRF model ranked by normalized feature importance score and color-coded by feature category. (C) Dot plot of features from full feature matrix iRF model showing the number of samples (sgRNAs) that were influenced by that feature (y-axis) versus the normalized importance of the feature (x-axis). Color temperature increases with the feature effect score (red, negative; blue, positive) and dot size is scaled by the normalized importance score. (D) Violin plot of R2 values for the top 5, 10, 20, 50, 100, 200, 500 and 1000 features, based on full feature iRF model output. There is a plateau of information gained from including features with low importance scores.
Figure 3.
Figure 3.
Explainable-AI interpretation through iRF output metrics and features’ directional influence on cutting efficiency. (A, B) The top 20 features from the full feature matrices ranked by normalized importance score and color-coded by the direction of the effect. Positive correlations with the cutting efficiency score are blue while anti-correlations with cutting efficiency score are pink for E. coli (A) and H. sapiens (B). (C, D) sgRNA-DNA interaction highlighting quantum chemical features of top importance, their locations, and correlated associations with cutting efficiency scores in E. coli (C) and H. sapiens (D). DNA strand represented in gray (target sequence) and blue (target complementary sequence), sgRNA shown in yellow, and PAM sequence displayed with NGG stars. The feature effect direction is indicated with arrows, up (blue arrow) indicates a positively correlated relationship between the feature value and the cutting efficiency value. Feature bars indicate quantum properties (HL gap, purple; Stacking interactions, green; H-bonding, blue) and the length of the bar indicates the k-mer size. Multi-colored bars indicate the same k-mer at the same position has multiple features assessed as highly important. The E. coli (C) model shows extensive localization of important features, primarily bp, trimer and tetramers at positions 11–20. Hydrogen bonding has outlier importance at position 1–5. Hydrogen bonding and stacking energy features are observed in both correlated and anti-correlated relationships with cutting efficiency (depending on their k-mer and position) while HL-gap is consistently a positive relationship nearest the PAM sequence. The H. sapiens (D) model has lesser feature localization, with many features overlapping in positions 5–15. For features of high importance (hydrogen bonding, stacking energy, and HL-gap), the feature-specific directional effects span both positive and negative relationships with cutting efficiency, dependent on the feature length and position. Similarly to the E. coli model, bp, trimers and tetramers are the most predictive. The number of electrons H. sapiens a top feature for the H. sapiens model that is not among the top feature in E. coli.

Similar articles

Cited by

References

    1. Naim F., Shand K., Hayashi S., O’Brien M., McGree J., Johnson A.A.T., Dugdale B., Waterhouse P.M.. Are the current gRNA ranking prediction algorithms useful for genome editing in plants?. PLoS One. 2020; 15:e0227994. - PMC - PubMed
    1. Doudna J.A., Charpentier E.. The new frontier of genome engineering with CRISPR-Cas9. Science. 2014; 346:1258096. - PubMed
    1. Wu X., Kriz A.J., Sharp P.A.. Target specificity of the CRISPR-Cas9 system. Quant Biol. 2014; 2:59–70. - PMC - PubMed
    1. Liu G., Zhang Y., Zhang T.. Computational approaches for effective CRISPR guide RNA design and evaluation. Comput. Struct. Biotechnol. J. 2020; 18:35–44. - PMC - PubMed
    1. Moreno-Mateos M.A., Vejnar C.E., Beaudoin J.-D., Fernandez J.P., Mis E.K., Khokha M.K., Giraldez A.J.. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat. Methods. 2015; 12:982–988. - PMC - PubMed

Publication types