. 2021 Aug 19;22(1):235.

doi: 10.1186/s13059-021-02458-0.

Easy-Prime: a machine learning-based prime editor design tool

Yichao Li¹, Jingjing Chen^{2

3}, Shengdar Q Tsai², Yong Cheng^{4

5}

Affiliations

¹ Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN, USA. Yichao.Li@stjude.org.
² Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN, USA.
³ Integrated Biomedical Sciences Program, University of Tennessee Health Science Center, Memphis, TN, USA.
⁴ Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN, USA. Yong.Cheng@stjude.org.
⁵ Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA. Yong.Cheng@stjude.org.

PMID: 34412673
PMCID: PMC8377858
DOI: 10.1186/s13059-021-02458-0

Easy-Prime: a machine learning-based prime editor design tool

Yichao Li et al. Genome Biol. 2021.

. 2021 Aug 19;22(1):235.

doi: 10.1186/s13059-021-02458-0.

Authors

Yichao Li¹, Jingjing Chen^{2

3}, Shengdar Q Tsai², Yong Cheng^{4

5}

Affiliations

¹ Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN, USA. Yichao.Li@stjude.org.
² Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN, USA.
³ Integrated Biomedical Sciences Program, University of Tennessee Health Science Center, Memphis, TN, USA.
⁴ Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN, USA. Yong.Cheng@stjude.org.
⁵ Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA. Yong.Cheng@stjude.org.

PMID: 34412673
PMCID: PMC8377858
DOI: 10.1186/s13059-021-02458-0

Abstract

Prime editing is a revolutionary genome-editing technology that can make a wide range of precise edits in DNA. However, designing highly efficient prime editors (PEs) remains challenging. We develop Easy-Prime, a machine learning-based program trained with multiple published data sources. Easy-Prime captures both known and novel features, such as RNA folding structure, and optimizes feature combinations to improve editing efficiency. We provide optimized PE design for installation of 89.5% of 152,351 GWAS variants. Easy-Prime is available both as a command line tool and an interactive PE design server at: http://easy-prime.cc/ .

Keywords: Machine learning; Prime editor; pegRNA design.

PubMed Disclaimer

Conflict of interest statement

S.Q.T. is a member of the scientific advisory board of Kromatid and Twelve Bio.

Figures

**Fig. 1**
Overview of Easy-Prime design and machine learning model evaluation. a Cas9 activity feature is predicted by DeepSpCas9 score (purple box). (2) Oligo features (yellow box) are the GC content and sequence length of the PBS and RTT. (3) Target mutation features (cyan box) are whether the target mutation disrupts the PAM sequence, whether the ngRNA spacer sequence matches to the edited protospacer sequence, and the numbers of mismatches, deletions, and insertions. (4) Position features (pink box) are the distance between the ngRNA and the sgRNA (ngRNA_pos), the distance between the target mutation and the sgRNA (Target_pos), and the number of nucleotides downstream of the desired edit (target_end_flank). (5) RNA folding features are the maximal pairing probability between each of the first 10 bp of the RTT and the scaffold sequence based on RNAplfold [29]. b A machine learning workflow for data preprocessing, feature extraction, and model training and evaluation. c and d are correlation scatter plots of the true PE efficiency (x-axis) and the predicted efficiency (y-axis). c Train-test-split evaluation for the PE2 model and nested cross-validation evaluation for the PE3 model. d An independent PE data used for a third-party data evaluation for the PE3 model. “R” is spearman correlation coefficient. “r” is Pearson correlation coefficient

**Fig. 2**
Features associated with PE efficiency. a, b Feature importance plot of the XGBoost regression model. Feature rankings are based on the mean absolute SHAP value for the PE2 and PE3 model. RNA folding features are combined for simplified visualization. Target_end_flank: number of nucleotides from target mutation to the end of RTT sequence. Target_pos: distance between target mutation and sgRNA nick site. ngRNA_pos: distance between ngRNA nick site and sgRNA nick site. c Schematic view of RNA-folding disruption score formulation. On the left, a pegRNA sequence consisting of an sgRNA (red), a scaffold sequence (orange), and an RTT sequence (green) is labeled with positions and nucleotides, such as 81G. The pairing probability between 81G and the first position in the RTT sequence is denoted as P(1,81). On the right is a heatmap of the pairing probability between each position in the scaffold and the 3′ extension sequence (i.e., RTT + PBS). P(1,81) is highlighted by a red dashed box. At bottom left, the formula to calculate D(i) is shown, where i represents the position in the 3′ extension. d Line plot showing the trend of correlations between the first 16 positions in the 3′ extension and the targeted editing frequency

**Fig. 3**
Experimental validation of PE designs by Easy-Prime. a Barplot showing the observed editing efficiencies to install a positive control (HEK3 + 1TtoG, blue bar) and 7 blood variants predicted by Easy-Prime (pink bars). Replicates are represented by grey dots. b Barplot showing paired PE design comparison between Easy-Prime prediction (pink bars) and PrimeDesign recommendation (cyan bars)

**Fig. 4**
The web portal interface for Easy-Prime. a Screenshot of the Easy-Prime web portal (based on DASH [33]). Easy-Prime takes a file in vcf or fasta format as input. It searches and optimizes all individual sgRNA-PBS-RTT-ngRNA combinations and visualizes the gRNAs with the highest predicted efficiency for each input variant. b An interactive PE design visualization based on the ProteinPaint genome browser

See this image and copyright information in PMC

References

1. Pickar-Oliver A, Gersbach CA. The next generation of CRISPR–Cas technologies and applications. Nat. Rev. Mol. Cell Biol. 2019;20(8):490–507. doi: 10.1038/s41580-019-0131-5. - DOI - PMC - PubMed
1. Yin H, Xue W, Anderson DG. CRISPR-Cas: a tool for cancer research and therapeutics. Nat. Rev. Clin. Oncol. 2019;16(5):281–295. doi: 10.1038/s41571-019-0166-8. - DOI - PubMed
1. High KA, Roncarolo MG. Gene Therapy. N Engl J Med. 2019;381:455–64. Gene Therapy, 5, 10.1056/NEJMra1706910. - PubMed
1. Anzalone AV, Koblan LW, Liu DR. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 2020;38:824–44. - PubMed
1. Gaudelli NM, Komor AC, Rees HA, Packer MS, Badran AH, Bryson DI, Liu DR. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature. 2017;551(7681):464–471. doi: 10.1038/nature24644. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Easy-Prime: a machine learning-based prime editor design tool

Affiliations

Easy-Prime: a machine learning-based prime editor design tool

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases