. 2014 Jun 17:15:191.

doi: 10.1186/1471-2105-15-191.

Efficient design of meganucleases using a machine learning approach

Mikhail Zaslavskiy, Claudia Bertonati, Philippe Duchateau¹, Aymeric Duclert, George H Silva

Affiliations

PMID: 24934562
PMCID: PMC4065607
DOI: 10.1186/1471-2105-15-191

Efficient design of meganucleases using a machine learning approach

Mikhail Zaslavskiy et al. BMC Bioinformatics. 2014.

. 2014 Jun 17:15:191.

doi: 10.1186/1471-2105-15-191.

Authors

Mikhail Zaslavskiy, Claudia Bertonati, Philippe Duchateau¹, Aymeric Duclert, George H Silva

Affiliation

¹ Research and Development department, Cellectis, 8 rue de la Croix Jarry, Paris 75013, France. philippe.duchateau@cellectis.com.

PMID: 24934562
PMCID: PMC4065607
DOI: 10.1186/1471-2105-15-191

Abstract

Background: Meganucleases are important tools for genome engineering, providing an efficient way to generate DNA double-strand breaks at specific loci of interest. Numerous experimental efforts, ranging from in vivo selection to in silico modeling, have been made to re-engineer meganucleases to target relevant DNA sequences.

Results: Here we present a novel in silico method for designing custom meganucleases that is based on the use of a machine learning approach. We compared it with existing in silico physical models and high-throughput experimental screening. The machine learning model was used to successfully predict active meganucleases for 53 new DNA targets.

Conclusions: This new method shows competitive performance compared with state-of-the-art in silico physical models, with up to a fourfold increase in terms of the design success rate. Compared to experimental high-throughput screening methods, it reduces the number of screening experiments needed by a factor of more than 100 without affecting final performance.

PubMed Disclaimer

Figures

**Figure 1**
**I-CreI/DNA binding interface. (A)** Natural I-CreI target site with all positions indexed with respect to the center of the site from -11 to 11. -11NNNN and -5NNN are the reverse-complements of 11N4 and 5N3. **(B)** 3D structure of the I-CreI/DNA complex (PDB code: 1g9y). **(C)** I-CreI/DNA interaction map. Columns correspond to position on the DNA, rows correspond to positions of protein residues. Colors in the table are used to describe the nature of interaction between residues and nucleotides: dark green – backbone interactions, blue – water mediated, red – base specific. Residues N30-S40 and Q44-D75 are clustered together to indicate that they contact separate regions 11N4 and 5N3 on the DNA target.

**Figure 2**
**Cross-validation performance of various in silico methods.** **(Left)** %Top10 — percentage of targets with at least one positive molecule in Top10 ranked, **(Right)** AUC – AUC score (see Material and Methods) Mact - predictions made on the basis of module cleavage activities, Fx — FoldX score, Rt — Rosetta score, SeqMact — protein/target sequences + module cleavage activities, SeqMactFxStr — all features combined (sequences + module cleavage activities + FoldX scores and interactions). Error bars are estimated from 30 independent cross-validation experiments.

**Figure 3**
**Performance of ML model as a function of training set composition. (Left)** Performance of ML model as a function of the training set size (i.e. number of combinatorial libraries). Experimental setting are similar to those presented in Figure 2, where each point corresponds to the cross-validation performance when we use only a portion of the training data. **(Right)** Success rate as a function of the minimal distance between test and training targets (1, 2, 3) – distance in number of bases, (100%, 80%, 20%) – proportion of the training set which is kept after removal of targets which are too similar to targets in the test set. Distance subsampling – distance based selection of targets, Uniform subsampling – random selection of equivalent size training set; r gives the drop (ratio) in performance score due to the distance based selection of training targets.

**Figure 4**
**Cross-validation performance of ML model as a function of interaction features. (Left)** %Top10 — percentage of targets with at least one positive molecule in Top10 ranked. Description of various groups of features (SM-5, SM-11, SM-5_11, SM-M2M, SM-M2T, SM-Cross, SM-Intra and SeqMact) are given in the text. Error bars are estimated from 30 independent cross-validation experiments. **(Right)** Prediction of active mutants at least as specific as the wild type I-CreI. Top10 — avg. number of active proteins at least as specific as I-CreI in top10 ranked molecules, α — trade-off parameter between predicted specificity and activity of candidate proteins. Seq – machine learning model trained on protein/target sequences, Fx – FoldX score.

**Figure 5**
**Success rate of meganuclease design methods. (Left)** Experimental results on targets sampled from ETS (extended target space). **(Right)** Experimental results on targets sampled from RTS (restricted target space). SeqMact - machine learning predictions, SeqMact + — machine learning predictions with additional I132V mutation, Comb — combinatorial libraries. GTAC — proportion of GTAC target variants with at least one positive mutant, ORIG — proportion of original (sampled) targets with at least one positive mutant, ORIGstrong — proportion of original (sampled) targets with at least one highly active mutant (normalized cleavage activity score above 0.8).

See this image and copyright information in PMC

Cited by

Synthetic biology in cell-based cancer immunotherapy.
Chakravarti D, Wong WW. Chakravarti D, et al. Trends Biotechnol. 2015 Aug;33(8):449-61. doi: 10.1016/j.tibtech.2015.05.001. Epub 2015 Jun 16. Trends Biotechnol. 2015. PMID: 26088008 Free PMC article. Review.
'Off-the-shelf' allogeneic CAR T cells: development and challenges.
Depil S, Duchateau P, Grupp SA, Mufti G, Poirot L. Depil S, et al. Nat Rev Drug Discov. 2020 Mar;19(3):185-199. doi: 10.1038/s41573-019-0051-2. Epub 2020 Jan 3. Nat Rev Drug Discov. 2020. PMID: 31900462 Review.
CRISPR-Cas9 in basic and translational aspects of cancer therapy.
Samareh Salavatipour M, Poursalehi Z, Hosseini Rouzbahani N, Mohammadyar S, Vasei M. Samareh Salavatipour M, et al. Bioimpacts. 2024;14(6):30087. doi: 10.34172/bi.2024.30087. Epub 2024 Mar 10. Bioimpacts. 2024. PMID: 39493894 Free PMC article. Review.
Genome-Editing Technologies: Concept, Pros, and Cons of Various Genome-Editing Techniques and Bioethical Concerns for Clinical Application.
Khan SH. Khan SH. Mol Ther Nucleic Acids. 2019 Jun 7;16:326-334. doi: 10.1016/j.omtn.2019.02.027. Epub 2019 Apr 3. Mol Ther Nucleic Acids. 2019. PMID: 30965277 Free PMC article. Review.
Allogeneic CAR-T Therapy Technologies: Has the Promise Been Met?
Lonez C, Breman E. Lonez C, et al. Cells. 2024 Jan 12;13(2):146. doi: 10.3390/cells13020146. Cells. 2024. PMID: 38247837 Free PMC article. Review.

See all "Cited by" articles

References

1. Umezawa T, Fujita M, Fujita Y, Yamaguchi-Shinozaki K, Shinozaki K. Engineering drought tolerance in plants: discovering and tailoring genes to unlock the future. Curr Opin Biotechnol. 2006;17(2):113–122. doi: 10.1016/j.copbio.2006.02.002. - DOI - PubMed
1. Lee SK, Chou H, Ham TS, Lee TS, Keasling JD. Metabolic engineering of microorganisms for biofuels production: from bugs to synthetic biology to fuels. Curr Opin Biotechnol. 2008;19(6):556–563. doi: 10.1016/j.copbio.2008.10.014. - DOI - PubMed
1. Silva G, Poirot L, Galetto R, Smith J, Montoya G, Duchateau P, Paques F. Meganucleases and other tools for targeted genome engineering: perspectives and challenges for gene therapy. Curr Gene Ther. 2011;11(1):11–27. doi: 10.2174/156652311794520111. - DOI - PMC - PubMed
1. Boch J, Scholze H, Schornack S, Landgraf A, Hahn S, Kay S, Lahaye T, Nickstadt A, Bonas U. Breaking the code of DNA binding specificity of TAL-type III effectors. Science. 2009;326(5959):1509–1512. doi: 10.1126/science.1178811. - DOI - PubMed
1. Jiang W, Bikard D, Cox D, Zhang F, Marraffini LA. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat Biotechnol. 2013;31(3):233–239. doi: 10.1038/nbt.2508. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Efficient design of meganucleases using a machine learning approach

Affiliation

Efficient design of meganucleases using a machine learning approach

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources