Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 22;25(2):bbae066.
doi: 10.1093/bib/bbae066.

CRISPRlnc: a machine learning method for lncRNA-specific single-guide RNA design of CRISPR/Cas9 system

Affiliations

CRISPRlnc: a machine learning method for lncRNA-specific single-guide RNA design of CRISPR/Cas9 system

Zitian Yang et al. Brief Bioinform. .

Abstract

CRISPR/Cas9 is a promising RNA-guided genome editing technology, which consists of a Cas9 nuclease and a single-guide RNA (sgRNA). So far, a number of sgRNA prediction softwares have been developed. However, they were usually designed for protein-coding genes without considering that long non-coding RNA (lncRNA) genes may have different characteristics. In this study, we first evaluated the performances of a series of known sgRNA-designing tools in the context of both coding and non-coding datasets. Meanwhile, we analyzed the underpinnings of their varied performances on the sgRNA's specificity for lncRNA including nucleic acid sequence, genome location and editing mechanism preference. Furthermore, we introduce a support vector machine-based machine learning algorithm named CRISPRlnc, which aims to model both CRISPR knock-out (CRISPRko) and CRISPR inhibition (CRISPRi) mechanisms to predict the on-target activity of targets. CRISPRlnc combined the paired-sgRNA design and off-target analysis to achieve one-stop design of CRISPR/Cas9 sgRNAs for non-coding genes. Performance comparison on multiple datasets showed that CRISPRlnc was far superior to existing methods for both CRISPRko and CRISPRi mechanisms during the lncRNA-specific sgRNA design. To maximize the availability of CRISPRlnc, we developed a web server (http://predict.crisprlnc.cc) and made it available for download on GitHub.

Keywords: CRISPR/Cas9; lncRNA; machine learning; sgRNA.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A complete workflow for the lncRNA-specific CRISPR/Cas9 sgRNA design.
Figure 2
Figure 2
Comparison of 11 software scores of effective and ineffective sgRNAs on three different datasets. The vertical axis of each sub-graph indicates the normalized scores of each tool. The scores of effective and ineffective sgRNAs in different datasets were represented using boxplots, and statistical significance tests were conducted (t-test, *P-value 0.05, **P-value 0.01, ***P-value 0.001, ****P-value 0.0001).
Figure 3
Figure 3
Performance evaluation of known sgRNA-designing tools on three different datasets. (A) is the Accuracy of all tools for dichotomous sgRNAs. (B)–(D) are the ROC curves of each tool under coding gene knock-out mechanism dataset, non-coding gene knock-out mechanism dataset and non-coding gene inhibition mechanism dataset, respectively.
Figure 4
Figure 4
The difference in similarity between the predictions of known sgRNA-designing tools on three different datasets, with green indicating good similarity and red indicating poor similarity. (A) is the Spearman similarity on the three datasets, and numbers in the cells correspond to the Spearman similarity scores. (B) is the Kendall similarity on the three datasets, and the score’ column is the Kendall similarity score between each prediction tool and the original classification of sgRNA in each dataset. The number marked after each row refers to the average Kendall similarity of the tool’s classification with other tools.
Figure 5
Figure 5
Distinct preferences of sgRNA towards different sequence and structure features across the three datasets. (A) is a comparison of the genomic localization of sgRNAs in the three datasets. (B) is the possibility of base distribution at each position of sgRNA sequences in the three datasets. (C)–(F) is the GC content distribution of each region of sgRNA sequences in the three datasets (t-test, *P-value 0.05, **P-value 0.01, ***P-value 0.001, ****P-value 0.0001). (G) shows distribution of the minimum free energy required to unravel the secondary structure of sgRNA in the three datasets (t-test, *P-value 0.05, **P-value 0.01, ***P-value 0.001, ****P-value 0.0001).
Figure 6
Figure 6
10-fold cross-validation scores of four models (decision tree, random forest, logistic regression and SVM) under NonCoding_CRISPRko and NonCoding_CRISPRi training sets.
Figure 7
Figure 7
Feature selection results and performance comparison results of the two models. (A) is the feature importance on the two datasets; in NonCoding_CRISPRko, we extracted 16 best features, and in NonCoding_CRISPRi, we extracted the 18 best features from all 27 features. (B) is the performance comparison of CRISPRlnc with other tools under the independent NonCoding_CRISPRko test dataset for both Accuracy and F1-score metrics. (C) is the performance comparison of CRISPRlnc with other tools under the independent NonCoding_CRISPRi test dataset for both Accuracy and F1-score metrics.
Figure 8
Figure 8
Overview of CRISPRlnc web version. (A) Services and downloads available on the website. (B) Examples of the website usage. (C) sgRNA design results based on CRISPRko mechanism. (D) sgRNA design results based on CRISPRi mechanism. (E) Statistics of sgRNA target results for lncRNA from Homo sapiens, Mus musculus and Danio rerio.

Similar articles

Cited by

References

    1. Perkel JM. Visiting “Noncodarnia”. Biotechniques 2013;54(6):301–4. - PubMed
    1. Gelbart ME, Kuroda MI. Drosophila dosage compensation: a complex voyage to the X chromosome. Development 2009;136(9):1399–410. - PMC - PubMed
    1. Phil Chi Khang A, Zhu Q-H, Dennis ES, Wang M-B. Long non-coding RNA-mediated mechanisms independent of the RNAi pathway in animals and plants. RNA Biol 2011;8(3):404–14. - PubMed
    1. Wilusz JE, Freier SM, Spector DL. 3formula image end processing of a long nuclear-retained noncoding RNA yields a tRNA-like cytoplasmic RNA. Cell 2008;135(5):919–32. - PMC - PubMed
    1. Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell 2009;136(4):629–41. - PubMed

Publication types

Substances