. 2021 Nov;39(11):1414-1425.

doi: 10.1038/s41587-021-00938-z. Epub 2021 Jun 28.

Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning

Luke W Koblan^#^{1

2

3}, Mandana Arbab^#^{1

2

3}, Max W Shen^#^{1

2

3

4}, Jeffrey A Hussmann^{5

6

7

8

9}, Andrew V Anzalone^{1

2

3}, Jordan L Doman^{1

2

3}, Gregory A Newby^{1

2

3}, Dian Yang^{5

7

8

9}, Beverly Mok^{1

2

3}, Joseph M Replogle^{5

7

10

11

8

9}, Albert Xu^{5

6

10

12}, Tyler A Sisley², Jonathan S Weissman^{13

14

15

16

17}, Britt Adamson^{18

19

20

21}, David R Liu^{22

23

24}

Affiliations

¹ Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
² Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA.
³ Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA.
⁴ Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA, USA.
⁵ Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA.
⁶ Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, USA.
⁷ Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA, USA.
⁸ Whitehead Institute for Biomedical Research, Cambridge, MA, USA.
⁹ Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
¹⁰ Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA, USA.
¹¹ Tetrad Graduate Program, University of California, San Francisco, San Francisco, CA, USA.
¹² Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA.
¹³ Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA. weissman@wi.mit.edu.
¹⁴ Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA, USA. weissman@wi.mit.edu.
¹⁵ Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA, USA. weissman@wi.mit.edu.
¹⁶ Whitehead Institute for Biomedical Research, Cambridge, MA, USA. weissman@wi.mit.edu.
¹⁷ Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. weissman@wi.mit.edu.
¹⁸ Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA. badamson@princeton.edu.
¹⁹ Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA, USA. badamson@princeton.edu.
²⁰ Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA. badamson@princeton.edu.
²¹ Department of Molecular Biology, Princeton University, Princeton, NJ, USA. badamson@princeton.edu.
²² Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA, USA. drliu@fas.harvard.edu.
²³ Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA. drliu@fas.harvard.edu.
²⁴ Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA. drliu@fas.harvard.edu.

^# Contributed equally.

PMID: 34183861
PMCID: PMC8985520
DOI: 10.1038/s41587-021-00938-z

Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning

Luke W Koblan et al. Nat Biotechnol. 2021 Nov.

. 2021 Nov;39(11):1414-1425.

doi: 10.1038/s41587-021-00938-z. Epub 2021 Jun 28.

Authors

Affiliations

¹ Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
² Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA.
³ Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA.
⁴ Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA, USA.
⁵ Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA.
⁶ Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, USA.
⁷ Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA, USA.
⁸ Whitehead Institute for Biomedical Research, Cambridge, MA, USA.
⁹ Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
¹⁰ Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA, USA.
¹¹ Tetrad Graduate Program, University of California, San Francisco, San Francisco, CA, USA.
¹² Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA.
¹³ Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA. weissman@wi.mit.edu.
¹⁴ Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA, USA. weissman@wi.mit.edu.
¹⁵ Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA, USA. weissman@wi.mit.edu.
¹⁶ Whitehead Institute for Biomedical Research, Cambridge, MA, USA. weissman@wi.mit.edu.
¹⁷ Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA. weissman@wi.mit.edu.
¹⁸ Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA. badamson@princeton.edu.
¹⁹ Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA, USA. badamson@princeton.edu.
²⁰ Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA. badamson@princeton.edu.
²¹ Department of Molecular Biology, Princeton University, Princeton, NJ, USA. badamson@princeton.edu.
²² Merkin Institute of Transformative Technologies in Healthcare, Broad Institute of Harvard and MIT, Cambridge, MA, USA. drliu@fas.harvard.edu.
²³ Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA. drliu@fas.harvard.edu.
²⁴ Howard Hughes Medical Institute, Harvard University, Cambridge, MA, USA. drliu@fas.harvard.edu.

^# Contributed equally.

PMID: 34183861
PMCID: PMC8985520
DOI: 10.1038/s41587-021-00938-z

Erratum in

Author Correction: Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning.
Koblan LW, Arbab M, Shen MW, Hussmann JA, Anzalone AV, Doman JL, Newby GA, Yang D, Mok B, Replogle JM, Xu A, Sisley TA, Weissman JS, Adamson B, Liu DR. Koblan LW, et al. Nat Biotechnol. 2023 Nov;41(11):1655. doi: 10.1038/s41587-023-02028-8. Nat Biotechnol. 2023. PMID: 37853259 No abstract available.

Abstract

Programmable C•G-to-G•C base editors (CGBEs) have broad scientific and therapeutic potential, but their editing outcomes have proved difficult to predict and their editing efficiency and product purity are often low. We describe a suite of engineered CGBEs paired with machine learning models to enable efficient, high-purity C•G-to-G•C base editing. We performed a CRISPR interference (CRISPRi) screen targeting DNA repair genes to identify factors that affect C•G-to-G•C editing outcomes and used these insights to develop CGBEs with diverse editing profiles. We characterized ten promising CGBEs on a library of 10,638 genomically integrated target sites in mammalian cells and trained machine learning models that accurately predict the purity and yield of editing outcomes (R = 0.90) using these data. These CGBEs enable correction to the wild-type coding sequence of 546 disease-related transversion single-nucleotide variants (SNVs) with >90% precision (mean 96%) and up to 70% efficiency (mean 14%). Computational prediction of optimal CGBE-single-guide RNA pairs enables high-purity transversion base editing at over fourfold more target sites than achieved using any single CGBE variant.

PubMed Disclaimer

Figures

**Figure 1.. Development of prototype C•G-to-G•C base editors.**
(a) Potential pathway for C•G-to-G•C conversion. (b) C•G-to-G•C editing outcomes in HEK293T cells for C-terminal fusions of DNA glycosylases to BE4B (AC, APOBEC1 cytidine deaminase–Cas9 nickase). (c) Different fusion protein architectures lead to different C•G-to-G•C editing properties in HEK293T cells at the HEK3 locus for the Apo-UdgX-Cas9n (AXC) architecture. Values and error bars reflect the mean and standard deviation of three biological replicates, shown as individual data points. HEK2=HEK site 2; HEK3=HEK site 3; HEK4=HEK site 4. C4, C6, and similar annotations indicate the in-window target nucleotides where the SpCas9 PAM is at positions 21–23.

**Figure 2.. CRISPRi knockdown screen across 476 genes enriched for those with roles in DNA repair identifies candidate regulators of C•G-to-G•C editing.**
(a) Schematic of screen design. (b). Summary of base editing outcomes in BE4B (also AC) screen. Bottom left – all editing outcomes containing only point mutations present at >=1% frequency for non-targeting CRISPRi guide RNAs. Line plots above the individual outcomes show the total editing frequency (black line) and the frequencies of each single base edit (C-to-T=red, C-to-G=brown, C-to-A=green, and G-to-C=blue lines) at each position. Line plots to the right show frequencies of outcomes for specific CRISPRi guide RNAs (blue - average of all non-targeting guide +/− standard deviation across individual non-targeting guide RNAs; orange - top 2 most active *UNG* guide RNAs). Heatmaps show log₂ fold changes in outcome frequencies for top 2 *UNG* guide RNAs relative to non-targeting guide RNAs. (c) Log₂ fold changes in frequency of outcomes containing C-to-T or C-to-G edits for each CRISPRi guide compared to non-targeting guide RNAs. Upper left - comparison of changes in C-to-T editing between two biological replicates. Lower right - comparison of changes in C-to-G editing between replicates. Upper right - comparison of changes in C-to-G editing to changes in C-to-T editing in replicate 1. All guide RNAs with at least 500 recovered UMIs in each replicate are plotted. Blue dots: individual non-targeting guide RNAs, orange dots: *UNG* guide RNAs, green dots: *ASCC3* guide RNAs, red dots: *RFWD3* guide RNAs, grey dots: all other guide RNAs. (d) Effects of gene knockdown on relative C-to-G editing frequencies in BE4B screen. Each dot represents a gene, with the x-value representing the average of the two strongest Log₂ fold changes in normalized C-to-G editing for guide RNAs targeting the gene from the average of all non-targeting guide RNAs, and the y-value representing a gene-level p-value summarizing the combined statistical significance of all guide RNAs targeting each gene (two-sided, uncorrected for multiple comparisons). Rep.=replicate.

**Figure 3.. Effect of varying the cytidine deaminase and Cas9 components of CGBEs on C•G-to-G•C editing outcomes in HEK293T cells.**
(a) C•G-to-G•C editing outcomes for catalytically impaired, narrow-window cytidine deaminases show higher editing purity at HEK2 and *RNF2*. (b) C•G-to-G•C editing outcomes for high-fidelity Cas9 variants show altered editing windows and improved CGBE performance at some positions. “Cas9” represents the Cas9 D10A nickase variant of each Cas effector. Values and error bars reflect the mean and standard deviation of three biological replicates, shown as individual data points. HEK2=HEK site 2; HEK3=HEK site 3; HEK4=HEK site 4. C4, C6, and similar annotations indicate the in-window target nucleotides where the SpCas9 PAM is at positions 21–23.

**Figure 4.. Novel engineered CGBEs with various DNA repair proteins, deaminases, Cas proteins, and architectures offer diverse editing performance on different target sites.**
(a) C•G-to-G•C editing performance of CGBEs at eight genomic loci in HEK293T cells. (b) Further characterization of C•G-to-G•C editing outcomes for 12 variants from (a) at various genomic loci in HEK293T cells. Values and error bars reflect the mean and standard deviation of three biological replicates. HEK2=HEK293T cells site 2; HEK3=HEK293T cells site 3; HEK4=HEK293T cells site 4. C nucleotide annotations indicate the target nucleotide positions in the protospacer, where the SpCas9 PAM is at positions 21–23. Editing efficiencies, product purities, and indel frequencies for constructs that were tested but not shown in this figure can be found in Supplementary Data 1.

**Figure 5.. Target library characterization and machine learning modeling of 10 CGBE variants.**
(a) Overview of genome-integrated target library assay. Libraries of 12,000 or 4,000 pairs of sgRNAs and corresponding target sites are integrated into the genomes of mammalian cells using Tol2 transposase and treated with base editors. Edited cells are enriched by antibiotic selection, and library cassettes are amplified for high-throughput sequencing. (b) Base editing windows. Values are C•G-to-G•C editing efficiencies normalized to a maximum of 100. The protospacer is at positions 1–20, with the SpCas9 PAM at positions 21–23. All data are in mES cells except for eA3A-nCas9, which is in HEK293T cells. (c) C•G-to-G•C editing purity in the comprehensive context library in mES cells. Box plots indicate median and interquartile range, whiskers indicate extrema, and black dots indicate mean. Two-sided Welch’s T-test * P≤5.1×10^-9. (d) Heatmap of observed C•G-to-G•C purities by CGBE in target contexts from the comprehensive context library in mES cells. Black nucleotides indicate the cytosine for which purity is calculated. Target sites were sorted by outcome variance and manually selected. (e) Clustering of CGBEs based on measured C•G-to-G•C purity in core window cytosines across the comprehensive context library in mESCs. Values are Pearson correlation. (f) Purity of editing outcomes across core window nucleotides in the comprehensive context library, ranked by C•G-to-G•C purity, averaged across CGBEs in mESCs. Trend lines and shading show the rolling mean and standard deviation across 1% intervals. (g) Representative sequence motifs for editing efficiency and C•G-to-G•C purity from logistic regression models. The sign of each learned weight indicates a contribution above (positive sign) or below (negative sign) the mean activity. Logo opacity is proportional to the motif’s Pearson’s R on held-out sequence contexts. (h) Observed C•G-to-G•C purity across CGBEs in mESCs compared to CGBE-Hive predictions. Trend lines and shading show the rolling mean and standard deviation. (i) Sequence motifs for C•G-to-G•C editing yield.

**Figure 6.. Target library characterization and machine learning modeling of CGBE variants.**
(a) Observed C-to-G purity by CGBE at SNVs predicted to have >80% C-to-G purity. Box plot indicates median and interquartile range, and whiskers indicate extrema. (b) Observed number of disease-related sgRNA-target pairs corrected at varying genotype precision and amino acid precision thresholds by various strategies for selecting CGBEs. See Supplementary Table 3. (c) Comparison of predicted versus observed correction yield of disease-related transversion SNVs in mES cells. Trend lines and shading show the rolling mean and standard deviation. (d) Comparison of predicted versus observed correction precision of disease-related transversion SNVs in mES cells. Trend lines and shading show the rolling mean and standard deviation. (e) Observed number of sgRNA-target pairs containing disease-related transversion SNVs corrected at various thresholds for genotype and amino acid precision. (f) Installation of disease-associated SNPs using CGBEs.

See this image and copyright information in PMC

Comment in

Improving CRISPR tools by elucidating DNA repair.
Lim JM, Kim HH. Lim JM, et al. Nat Biotechnol. 2021 Dec;39(12):1512-1514. doi: 10.1038/s41587-021-01149-2. Nat Biotechnol. 2021. PMID: 34873327 No abstract available.

References

1. Landrum MJ et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44, D862–D868 (2016). - PMC - PubMed
1. Komor AC, Kim YB, Packer MS, Zuris JA & Liu DR Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016). - PMC - PubMed
1. Gaudelli NM et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017). - PMC - PubMed
1. Gehrke JM et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nature Biotechnology 36, 977–982 (2018). - PMC - PubMed
1. Nishida K et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729–aaf8729 (2016). - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning

Affiliations

Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning

Authors

Affiliations

Erratum in

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous