Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 2:11:543.
doi: 10.1186/1471-2105-11-543.

Predicting success of oligomerized pool engineering (OPEN) for zinc finger target site sequences

Affiliations

Predicting success of oligomerized pool engineering (OPEN) for zinc finger target site sequences

Jeffry D Sander et al. BMC Bioinformatics. .

Abstract

Background: Precise and efficient methods for gene targeting are critical for detailed functional analysis of genomes and regulatory networks and for potentially improving the efficacy and safety of gene therapies. Oligomerized Pool ENgineering (OPEN) is a recently developed method for engineering C2H2 zinc finger proteins (ZFPs) designed to bind specific DNA sequences with high affinity and specificity in vivo. Because generation of ZFPs using OPEN requires considerable effort, a computational method for identifying the sites in any given gene that are most likely to be successfully targeted by this method is desirable.

Results: Analysis of the base composition of experimentally validated ZFP target sites identified important constraints on the DNA sequence space that can be effectively targeted using OPEN. Using alternate encodings to represent ZFP target sites, we implemented Naïve Bayes and Support Vector Machine classifiers capable of distinguishing "active" targets, i.e., ZFP binding sites that can be targeted with a high rate of success, from those that are "inactive" or poor targets for ZFPs generated using current OPEN technologies. When evaluated using leave-one-out cross-validation on a dataset of 135 experimentally validated ZFP target sites, the best Naïve Bayes classifier, designated ZiFOpT, achieved overall accuracy of 87% and specificity+ of 90%, with an ROC AUC of 0.89. When challenged with a completely independent test set of 140 newly validated ZFP target sites, ZiFOpT performance was comparable in terms of overall accuracy (88%) and specificity+ (92%), but with reduced ROC AUC (0.77). Users can rank potentially active ZFP target sites using a confidence score derived from the posterior probability returned by ZiFOpT.

Conclusion: ZiFOpT, a machine learning classifier trained to identify DNA sequences amenable for targeting by OPEN-generated zinc finger arrays, can guide users to target sites that are most likely to function successfully in vivo, substantially reducing the experimental effort required. ZiFOpT is freely available and incorporated in the Zinc Finger Targeter web server (http://bindr.gdcb.iastate.edu/ZiFiT).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Base composition differs in active versus inactive ZFP target sites. A) Total base counts for active and inactive ZFP target sites (from ZFTS135, a dataset of 135 experimentally validated 9-bp target sites, see Additional File 1 - Table S1) reveal that variation in the average frequency of each base differentiates active versus inactive target sites. The total number of G and T residues relative to A and C is inflated because currently available OPEN pools are designed to target GNN and TNN triplets. B) Positional base counts, i.e., average base counts for each position within target site triplets (1st, 2nd, 3rd), suggest that thymine bases negatively impact ZFP binding at all three positions. C) An iceLogo [50] generated from ZFTS135 illustrates the difference in percentage composition of nucleotides at each position, from 1 - 9 (5' to 3'), between the positive class and the entire dataset. For example, 78% of all sites in ZFTS135 have a G in position 1, whereas 88% of all active sites have a G at position 1, resulting in a difference of 10%. Positive difference values indicate that, on average, the indicated bases are favored at those positions in active sites; negative difference values indicate that the indicated bases are disfavored. These position-specific differences in percentage composition also support the conclusion that thymine bases tend to occur in inactive targets (i.e., they have large negative propensities).
Figure 2
Figure 2
Receiver Operating Characteristic (ROC) curves for Naïve Bayes and SVM classifiers.

Similar articles

Cited by

References

    1. Carroll D. Progress and prospects: zinc-finger nucleases as gene therapy agents. Gene Ther. 2008;15(22):1463–1468. doi: 10.1038/gt.2008.145. - DOI - PMC - PubMed
    1. Cathomen T, Keith Joung J. Zinc-finger nucleases: the next generation emerges. Mol Ther. 2008;16(7):1200–1207. doi: 10.1038/mt.2008.114. - DOI - PubMed
    1. Urnov FD, Rebar EJ, Holmes MC, Zhang HS, Gregory PD. Genome editing with engineered zinc finger nucleases. Nat Rev Genet. 2010;11(9):636–646. doi: 10.1038/nrg2842. - DOI - PubMed
    1. Morton J, Davis MW, Jorgensen EM, Carroll D. Induction and repair of zinc-finger nuclease-targeted double-strand breaks in Caenorhabditis elegans somatic cells. Proc Natl Acad Sci USA. 2006;103(44):16370–16375. doi: 10.1073/pnas.0605633103. - DOI - PMC - PubMed
    1. Santiago Y, Chan E, Liu PQ, Orlando S, Zhang L, Urnov FD, Holmes MC, Guschin D, Waite A, Miller JC, Rebar EJ, Gregory PD, Klug A, Collingwood TN. Targeted gene knockout in mammalian cells by using engineered zinc-finger nucleases. Proc Natl Acad Sci USA. 2008;105(15):5809–5814. doi: 10.1073/pnas.0800940105. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources