Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 20;53(12):gkaf595.
doi: 10.1093/nar/gkaf595.

SELECT: high-precision genome editing strategy via integration of CRISPR-Cas and DNA damage response for cross-species applications

Affiliations

SELECT: high-precision genome editing strategy via integration of CRISPR-Cas and DNA damage response for cross-species applications

Xiaohang Liu et al. Nucleic Acids Res. .

Abstract

CRISPR-based methods enable genome modifications for diverse applications but often face challenges, such as inconsistent efficiencies, reduced performance in iterative modifications, and difficulties generating high-quality datasets for high-throughput genome engineering. Here, we present SELECT (SOS Enhanced programmabLE CRISPR-Cas ediTing), a novel strategy integrating the CRISPR-Cas system with the DNA damage response. By employing designed and optimized double-strand break induced promoters that are activated upon genome editing, SELECT enables a counter-selection process to eliminate unedited cells, ensuring high-fidelity editing. This approach achieves up to 100% efficiency for point mutations, iterative knockouts, and insertions. In high-throughput library editing, SELECT achieved up to 94.2% efficiency and preserved higher library diversity compared with conventional methods. Application of SELECT in flaviolin biosynthesis resulted in a 3.97-fold increase in production. Furthermore, integration with machine learning tools allowed rapid mapping of genotype-phenotype relationships. SELECT provides a versatile platform for precision genome engineering in Escherichia coli and Saccharomyces cerevisiae.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
The SELECT strategy in E. coli. (A) Workflow of the SELECT method in E. coli. (I) To achieve precise genome editing, uniquely designed dual-gRNA plasmids were used to validate three types of editing strategies: single locus editing, genome deletion or insertion, and library genome editing. (II) The gRNA plasmid (or along with donor DNA, or a plasmid library) was transformed into E. coli harboring a counter-selection plasmid (carrying a counter-selectable marker under the control of an inducible promoter) and a Cas plasmid (encoding the Cas protein and the λ-Red recombination system). Details of the plasmids are provided in Supplementary Fig. 2. (III) Activation of the CRISPR–Cas system, including Cas9 or Cas12a protein, and gRNA1 targeting a specific genomic site, inducing DSBs and genome editing. (IV) DSBs trigger the DSB-induced promoter, driving the transcription of gRNA2 which targets the resistance gene on the counter-selection plasmid, removing the plasmid by the secondary CRISPR–Cas system. (V) Verification of editing using high-throughput sequencing and phenotypic assays to successfully purify edited cells. (B–D) Editing efficiencies of the SELECT method targeting the galK inactivation mutation, constructed with the LexA-derived promoter and three different counter-selection proteins: SacB* (S164T), NfsI, and CcdB* (L96P) for ErCas12a (B), SpCas9 (C), and AsCas12a (D). To improve the resolution of the SELECT strategy, the non-targeting plasmid was introduced at a 1:2 ratio during the editing process. EP1 serves as a control, and the inducer concentration was optimized. Suc, sucrose; MTZ, metronidazole; and aTc, anhydrotetracycline, which are used as lethal inducers for three distinct counter-selection markers. n = 3 for each curve. Error bars represent the mean ± SD. (E–G) Editing efficiencies and CFU with four different DSB-induced promoters using the CcdB protein for inactivating the galK gene in the CRISPR–Cas systems. (E) ErCas12a, (F) SpCas9, and (G) AsCas12a. Non-targeting plasmid was introduced at a 1:2 ratio during the editing process. Empty vector EP1 serves as a control. n = 3 for each curve. Error bars represent the mean ± SD.
Figure 2.
Figure 2.
Long fragment deletions using the SELECT strategy. (A) Schematic diagram of long fragment deletions with the SELECT method. An 80 bp ssDNA was used to facilitate large fragment deletions in the genome, with 40 bp homologous to the upstream sequences (5′ HM) and 40 bp homologous to the downstream sequences (3′ HM) of the deletion region. (B and C) Editing efficiency of the large-fragment deletions for 200 bp (B) and 1000 bp (C) at the intergenic SS9 site. Deletion verification was performed in the ErCas12a, SpCas9, and AsCas12a systems using the SELECT method with CcdB_L96P and the LexA-derived promoter. Empty vector, EP1, lacking the CcdB_L96P counter-selection marker, was used as a control. Editing efficiency was determined by PCR screening of randomly selected colonies. n = 3 for each experiment. Error bars represent mean values ± SD. (D) Pyruvate to acetyl-CoA metabolic pathway in E. coli MG1655. The four major by-products of this pathway and six key genes are annotated: ldhA (961 bp), pflB (1681 bp), poxB (2281 bp), adhE (2641 bp), ackA (3302 bp), and pta (3302 bp). These genes were individually knocked out and iteratively sequentially deleted. (E and F) Deletion efficiencies of these six branch pathway genes in the ErCas12a (E) and SpCas9 (F) systems. The SELECT method includes CcdB_L96P and the LexA-derived promoter. EP1 was used as a control. Editing efficiency was determined by PCR screening of randomly selected colonies. The gRNAs used to target the ackA and pta genes are distinct, while the homologous recombination templates are the same. The length of each knocked-out gene is indicated at the top of the figure. n = 3 for each experiment. Error bars represent mean values ± SD. (G and H) Deletion efficiencies of these six branch pathway genes from the sequential iterations conducted in the ErCas12a-rha (G) and SpCas9-rha (H) systems. EP1 was used as a control. The SELECT method includes CcdB_L96P and the LexA-derived promoter. The gRNAs used to target the ackA and pta genes are distinct, while the homologous recombination templates are the same. n = 3 for each experiment. Error bars represent mean values ± SD.
Figure 3.
Figure 3.
Multiplex gene editing using the SELECT strategy. (A) Schematic diagram of the mutant library design, illustrating the selection of mutation sites for the galK and lacZ genes and the design of the editing cassettes. The homologous fragment introduces the short sequences containing a stop codon (TAA) and an SPM. The plasmid library was transformed into E. coli, and editing was performed. The gRNA directs a CRISPR cut near the targeted codon to facilitate HDR for codon swapping. The editing results were analyzed on MacConkey agar plates containing galactose or lactose; strains with nonsense mutations appear white. (B and C) Editing efficiencies of the mutation libraries targeting inactivation of the galK (B) or lacZ (C) genes in the ErCas12a system using the SELECT method with CcdB_L96P and the LexA-derived promoter. For 200 μl of competent cells, electroporation was performed using a 0.2 cm BIO-RAD Gene Pulser Electroporation cuvette with 100 ng of DNA. The gRNA library targeting the galK gene contained 10 different gRNAs, and that targeting the lacZ gene contained 17 different gRNAs. The editing efficiency and CFU were relatively low when the library was applied to edit, and the cells were plated immediately after recovery (denoted as 0 h). However, a significant improvement was observed after 16 h of culture expansion, leading to the standardization of library editing to 16 h. +nt indicates that one-third of the total amount of non-targeting gRNA plasmid was mixed into the library editing. In the SELECT strategy, the empty vector EP1, lacking the CcdB_L96P counter-selection marker, was used as a control. n = 3 for each experiment. Error bars represent mean values ± SD. (D and E) Editing efficiencies for the galK gene (D) and the lacZ gene (E) targeting with a gradient reduction in non-targeting gRNA (ntgRNA) dosage. The ratios 1:10, 1:50, 1:100, and 1:1000 indicate the respective proportions of the ntgRNA plasmid to targeting gRNA plasmid added, with the targeting gRNA plasmid fixed at 100 ng of the galK and lacZ mutation libraries in the ErCas12a editing system using SELECT method with CcdB_L96P and LexA-derived promoter. EP1 was used as a control. n = 3 for each experiment. Error bars represent mean values ± SD. (FI) Editing efficiency and proportion generated by the mixed library. The proportion of the galK gene mutant library was incrementally increased within the mixed galK and lacZ gene mutation libraries and edited in the ErCas12a system using the SELECT method with CcdB_L96P and the LexA-derived promoter. Green bars represent the galK gene editing efficiency obtained through red/white colony screening, while yellow bars indicate the lacZ editing efficiency measured by colony transfer. EP1 was used as a control (Ctrl.). n = 3 for each experiment. Error bars represent mean values ± SD.
Figure 4.
Figure 4.
High-throughput screening of the mutant libraries using the SELECT strategy. (A) Workflow for editing and screening of RBS mutant libraries of the accABD genes. The process involves synthesizing and constructing a library fragment containing ‘N’ sequences, amplifying and recombining it into gRNA plasmids to create stable plasmid-based donor editing vectors, followed by transformation into E. coli for multi-site editing to generate different RBS mutations. The accABD genes are key genes in malonyl-CoA synthesis, and RppA converts malonyl-CoA into red-colored flaviolin, where higher intracellular malonyl-CoA levels increase flaviolin production and secretion. (B) Genome single-gene editing efficiency determined by NGS analysis. After editing accA, accB, and accD genes, plasmids and genomes were extracted. Primers with barcodes were used to amplify the edited regions for sequencing and classification. Using the SELECT method with CcdB_L96P and theLexA-derived promoter, and vector EP1 as control, editing efficiency from NGS was calculated as the sum of reads corresponding to the desired RBS mutations divided by the total number of reads. n = 3 for each experiment. Error bars represent mean values ± SD. Statistical significance is indicated in the figure (P < 0.001, two-tailed Student's t-test). (CE) Tracking efficiency of genomic mutations using DNA barcodes from the plasmids with/without the SELECT method. Editing colonies of accA (C), accB (D), and accD (E) genes from the same plates were extracted and amplified. NGS was used to analyze plasmid and genomic mutations, with sequencing read counts plotted on the y-axis and x-axis, respectively, and fitted using two-dimensional linear regression. Sample size corresponds to the number of mutated variants in the library, excluding the wild type (WT). The blue dots indicate the editing using the empty vector EP1 as a control. The red dots represent the editing achieved using the SELECT strategy with ErCas12a, CcdB_L96P, and the LexA-derived promoter. NGS read counts were normalized across different genes or batches. (F and G) OD520/OD600 ratios from 48-well plate screening across different batches. Genomically edited colonies were randomly selected for color screening in 48-well plates, with OD600 values measured for cell cultures and OD520 values for supernatants. Genome editing with EP1 as a control (F) and using the SELECT method with CcdB_L96P and the LexA-derived promoter (G). Labels such as B2 indicate the highest-yielding mutant strains identified from different batches of high-production strains after color screening under identical conditions within the same batch. (H) Combinatorial mutations in the RBS regions of the accABD genes enhanced flaviolin production. Strains carrying the rppA gene were cultivated in PM25 medium in shake flasks at 37°C and initial pH 7.0 for 96 h. The inset shows the supernatants of strains WT, 1 (D1 + A1 + B1) and 2 (D1 + A2 + B2). Strain 2 produced more flaviolin than strain 1 and the WT, and the supernatants are redder. WT represents the wild type without any mutations or modifications in the RBS regions. n = 3 for each experiment. Error bars represent mean values ± SD.
Figure 5.
Figure 5.
ML-driven optimization of flaviolin production using the SELECT strategy. (A) Flowchart of training and validating the ML model. The input dataset consisted of the DNA nucleotide sequences of the RBS mutation region and the corresponding OD520/OD600 measurement values. The algorithm accepts input values after feature extraction, including the one-hot-encoded 8 bp core variant sequences and the TIR values calculated from the full RBS sequences using the RBS Calculator, as well as any optional hyperparameters. The dataset adopted a 5-fold cross-validation and outputs a Random Forest model. The trained model accepts features from a test set to validate the accuracy. (B–  D) Performance comparison of regression models evaluated using 5-fold cross-validation. The models were assessed using three metrics: (B) RMSE, (C) MAE, and (D) R². Models evaluated were Random Forest, Linear Regression, Support Vector Regression (SVR), XGBoost, and LightGBM. Error bars represent SDs across the folds. (E) Scatter plot illustrating the correlation between predicted and measured OD520/OD600 ratios in the test set. The prediction was generated using the ML model. Statistical evaluation showed a Pearson correlation coefficient R = 0.98, P-value < 10−8, and an RMSE of 0.02 for the test set, which indicated a strong agreement between predicted and actual values. The P-value demonstrates statistical significance. (F) Scatter plot showing residuals (measured minus predicted values) for both the training set (blue dots) and test set (red dots), plotted against predicted OD520/OD600 ratios. The curves at the top represent the marginal distribution of the predicted OD520/OD600 data, visualized as a kernel density estimation plot to depict the probability density distribution of this variable. Similarly, the curves on the right show the marginal distribution of the residuals data, also visualized as a KDE plot to represent the probability density distribution of this variable. The density plots were also drawn in blue (training set) and red (test set). The narrow and centered distributions of residuals indicate consistent predictive performance across datasets.
Figure 6.
Figure 6.
The workflow of the SELECT method in eukaryotic cells. (A) Workflow of the SELECT method in S. cerevisiae. (I) To ensure precise genome editing, the Cas9 plasmid was specifically designed to validate the efficiency of gene editing. (I) A Cas9 plasmid, harboring a gRNA1 targeting a specific genomic locus (e.g. ADE2) was introduced into S. cerevisiae. (III) Upon activation of the CRISPR–Cas9 system, the gRNA1 guided by Cas9 protein induces site-specific DSBs at the target genomic locus, initiating the genome editing process. (IV) The DSBs generated by Cas protein cleavage act as a signal, triggering the activation of a series of checkpoint response promoters. This subsequently transcribes the DSB-induced gRNA2, which targets the CAN1 gene in the yeast genome. (V) Verification of editing using phenotypic assays and Sanger sequencing to purify successfully edited cells. (B) The positions of the X boxes in the promoters of CRT1, HUG1, and RNR2. The arrows indicate the orientation of the X box. The editing efficiency of the ADE2 gene with different DSB-induced promoters in the CRISPR–Cas9 system. The ADE2 gene editing colonies grown on the plate without l-canavanine serve as the control. Xs means the strongly conserved X box in the promoters, Xw means the weakly conserved X box in the promoters. Each curve represents n = 3. Error bars indicate the mean ± SD. (CF) Editing efficiencies for AED2 gene targeting with a gradient reduction in ntgRNA dosage using the SELECT method with different promoters. The ratios 1:10, 1:50, 1:100, and 1:1000 indicate the respective proportions of ntgRNA plasmid to targeting gRNA plasmid added and using the SELECT method with CRT1p (C), HUG1p (D), RNR1p (E), and RNR2p (F) for editing. The ADE2 gene editing colonies grown on the plate without l-canavanine serve as the control. n =3 for each experiment. Error bars represent mean values ± SD.

Similar articles

References

    1. Anzalone AV, Koblan LW, Liu DR Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat Biotechnol. 2020; 38:824–44. 10.1038/s41587-020-0561-9. - DOI - PubMed
    1. Ramlee M, Yan T, Cheung A et al. High-throughput genotyping of CRISPR/Cas9-mediated mutants using fluorescent PCR-capillary gel electrophoresis. Sci Rep. 2015; 5:15587. 10.1038/srep15587. - DOI - PMC - PubMed
    1. Lian J, Schultz C, Cao M et al. Multi-functional genome-wide CRISPR system for high throughput genotype–phenotype mapping. Nat Commun. 2019; 10:5794. 10.1038/s41467-019-13621-4. - DOI - PMC - PubMed
    1. Wang H, Isaacs F, Carr P et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature. 2009; 460:894–8. 10.1038/nature08187. - DOI - PMC - PubMed
    1. Warner J, Reeder P, Karimpour-Fard A et al. Rapid profiling of a microbial genome using mixtures of barcoded oligonucleotides. Nat Biotechnol. 2010; 28:856–62. 10.1038/nbt.1653. - DOI - PubMed

LinkOut - more resources