Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jul 3:2024.07.01.601587.
doi: 10.1101/2024.07.01.601587.

CRISPR-GEM: A Novel Machine Learning Model for CRISPR Genetic Target Discovery and Evaluation

Affiliations

CRISPR-GEM: A Novel Machine Learning Model for CRISPR Genetic Target Discovery and Evaluation

Josh P Graham et al. bioRxiv. .

Update in

Abstract

CRISPR gene editing strategies are shaping cell therapies through precise and tunable control over gene expression. However, achieving reliable therapeutic effects with improved safety and efficacy requires informed target gene selection. This depends on a thorough understanding of the involvement of target genes in gene regulatory networks (GRNs) that regulate cell phenotype and function. Machine learning models have been previously used for GRN reconstruction using RNA-seq data, but current techniques are limited to single cell types and focus mainly on transcription factors. This restriction overlooks many potential CRISPR target genes, such as those encoding extracellular matrix components, growth factors, and signaling molecules, thus limiting the applicability of these models for CRISPR strategies. To address these limitations, we have developed CRISPR-GEM, a multi-layer perceptron (MLP)-based synthetic GRN constructed to accurately predict the downstream effects of CRISPR gene editing. First, input and output nodes are identified as differentially expressed genes between defined experimental and target cell/tissue types respectively. Then, MLP training learns regulatory relationships in a black-box approach allowing accurate prediction of output gene expression using only input gene expression. Finally, CRISPR-mimetic perturbations are made to each input gene individually and the resulting model predictions are compared to those for the target group to score and assess each input gene as a CRISPR candidate. The top scoring genes provided by CRISPR-GEM therefore best modulate experimental group GRNs to motivate transcriptomic shifts towards a target group phenotype. This machine learning model is the first of its kind for predicting optimal CRISPR target genes and serves as a powerful tool for enhanced CRISPR strategies across a range of cell therapies.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic of CRISPR-GEM. A) Constructing tissue specific GRNs to predict gene expression relevant to the selected CRISPR therapy. DeSeq2 is utilized to select plausible CRISPR candidates and identify important genes for distinguishing experimental and target cell/tissue types. The resulting genes are used as input and output nodes respectively to train a MLP to predict the expression of output genes using only the expression of input genes. B) Assessing CRISPR strategies using the constructed GRN. RNA-seq data for the experimental group is perturbed by amplifying or suppressing the relative expression value of each gene individually. The perturbed experimental group dataset and an un-modified target group dataset are input into the trained MLP and model predictions for each are compared to assess the efficacy of the editing strategy for driving the target group transcriptomic profile.
Fig. 2.
Fig. 2.
Model Training results given by CRISPR-GEM predictions for CRISPRa candidates to drive Treg differentiation. A) Treg cells display distinct phenotypes from iPSCs as visualized in reduced dimension via UMAP projection, B) Model training predicted output gene expression with an MSE of 0.051 ± 0.002, C) an r2 of 0.952 ± 0.002 (p = 1.11 ∙ 10−16). These results suggest that the CRISPR-GEM neural network can predict over 94% of the transcriptomic variance throughout the iPSC-Treg transition, and D) The strong linear trend between actual and predicted output gene expression values verifies model accuracy.
Figure 3.
Figure 3.
Assessment of model predictions for the optimal CRISPRa targets to drive Treg differentiation from iPSCs. A) STRING analysis of the top 50 ranked genes reveals a highly connected network of surface receptors and immunomodulatory genes predicted to strongly encourage iPSCs to commit to the Treg differentiation. The encoded proteins are associated with many pathways including immunomodulation (red), growth and differentiation (green), ECM composition (orange), cell structure (yellow), signal transduction (dark blue), transcription factors (light blue), and metabolism (teal). B) All gene targets identified by CRISPR-GEM are highly upregulated in Treg cells, most were more confidently upregulated than FOXP3, the genetic signature specific to Treg cells. Expression of SOX2, a pluripotency factor, is also given as a reference for iPSC function.
Fig. 4.
Fig. 4.
Group selection and model training for MSC chondrogenic characterization. MSCs, fetal cartilage, and mature cartilage were selected for experimental, intermediate, and target groups respectively. A) UMAP analysis confirms the clustering of cartilage cells and MSCs with fetal and mature cartilage subpopulations, B) The neural network was then trained with mean squared error (MSE) as the loss function, C) the resulting model was able to predict over 95% of the variance in the expression of chondrogenic genes as demonstrated by r2, and D) The model resulted in a clear linear trend between actual and predicted expression values for each gene.
Fig. 5.
Fig. 5.
CRISPR-GEM results for CRISPRa induced chondrogenesis. A) STRING co-expression matrix for top scoring genes demonstrating the importance of ECM catabolism and immune regulation in the chondrogenic process. The encoded proteins are associated with many pathways including immunomodulation (red), growth and differentiation (green), ECM composition (orange), cell structure (yellow), signal transduction (dark blue), transcription factors (light blue), and metabolism (teal). B) All top scoring genes are significantly upregulated during the chondrogenic process. THY1 and NT5E, two MSC surface receptors, demonstrate the shift from progenitor cells to chondrocytes.
Fig. 6.
Fig. 6.
Group selection and model training for the reversal of OA using CRISPRi. Osteoarthritic and healthy cartilage tissues were selected as the experimental and target groups respectively. A) UMAP clustering confirms the substantial variance between diseased and healthy cartilage, B) Neural network training resulted in a mean squared error (MSE) of 0.088 ± 0.0021 while predicting output gene expression, C) prompting an r2 of 0.943 ± 0.001, and D) The model resulted in a clear linear trend between actual and predicted expression values for each gene.
Fig. 7.
Fig. 7.
Visualizing the results for the top ranking CRISPRi targets to prevent OA. A) STRING co-expression matrix for top scoring genes shows a large network containing proteins involved in fibrosis, ECM catabolism, and immune regulation implicating these processes in OA. The encoded proteins are associated with many pathways including immunomodulation (red), growth and differentiation (green), ECM composition (orange), cell structure (yellow), signal transduction (dark blue), transcription factors (light blue), and metabolism (teal). B) The top ten scoring genes given by CRISPR-GEM are all upregulated during OA progression. These results positively correlated with MMP13 expression, a key OA marker; but they negatively correlated to the pro-regenerative, anti-inflammatory SOX9.

Similar articles

References

    1. Jinek M. et al. A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 337, 816–821 (2012). - PMC - PubMed
    1. Graham J., Werba L., Federico I. & Gonzalez-Fernandez T. CRISPR Strategies for Stem Cell Engineering: A New Frontier in Musculoskeletal Regeneration. Eur. Cell. Mater. 46, 91–118 (2023).
    1. Farhang N. et al. Synergistic CRISPRa-Regulated Chondrogenic Extracellular Matrix Deposition Without Exogenous Growth Factors. Tissue Eng. Part A 26, 1169–1179 (2020). - PMC - PubMed
    1. Yoshikawa T. et al. Genetic Ablation of PRDM1 in Antitumor T Cells Enhances Therapeutic Efficacy of Adoptive Immunotherapy. Blood 139, 2156–2172 (2022). - PubMed
    1. Langfelder P. & Horvath S. WGCNA: An R Package for Weighted Correlation Network Analysis. BMC Bioinformatics 9, 559 (2008). - PMC - PubMed

Publication types