Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Nov 23;32(20):e162.
doi: 10.1093/nar/gnh160.

Protein-mediated error correction for de novo DNA synthesis

Affiliations

Protein-mediated error correction for de novo DNA synthesis

Peter A Carr et al. Nucleic Acids Res. .

Abstract

The availability of inexpensive, on demand synthetic DNA has enabled numerous powerful applications in biotechnology, in turn driving considerable present interest in the de novo synthesis of increasingly longer DNA constructs. The synthesis of DNA from oligonucleotides into products even as large as small viral genomes has been accomplished. Despite such achievements, the costs and time required to generate such long constructs has, to date, precluded gene-length (and longer) DNA synthesis from being an everyday research tool in the same manner as PCR and DNA sequencing. A critical barrier to low-cost, high-throughput de novo DNA synthesis is the frequency at which errors pervade the final product. Here, we employ a DNA mismatch-binding protein, MutS (from Thermus aquaticus) to remove failure products from synthetic genes. This method reduced errors by >15-fold relative to conventional gene synthesis techniques, yielding DNA with one error per 10 000 base pairs. The approach is general, scalable and can be iterated multiple times for greater fidelity. Reductions in both costs and time required are demonstrated for the synthesis of a 2.5 kb gene.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Influence of error rates on de novo DNA synthesis. (A) The purity of gene synthesis products (yield of error-free clones) decreases exponentially with the length of the product synthesized. Error rates shown are 1 in 600 bp (blue) typical of conventional gene synthesis approaches, 1 in 1400 bp (red) (13), and 1 in 10 000 bp (yellow) as reported here. (B) The number of clones which must be sequenced to have a high (95%) probability of obtaining at least one which is error-free. The same three error rates as in (A) are indicated. Calculations are described in Supplementary Table A.
Figure 2
Figure 2
(A) Principal steps performed in the construction of synthetic genes employing MutS protein for error-reduction. The pie chart indicates the approximate amount of time consumed by each step (in hours), with a red arrow indicating the order of operations. The most time-consuming steps in this process are often oligonucleotide synthesis and DNA sequencing (including plasmid production). The 24+ and 48+ hours indicated for each of these represent lower bounds on these processes, possible if performed with immediate access to the appropriate equipment. If these steps are performed by outside providers, 3–5 days are typical of each step. Box 1: gene segments are synthesized and amplified using conventional PCR protocols. The resulting products are dissociated and re-annealed so that errors are present as DNA heteroduplexes (mismatches). Box 2: MutS protein is mixed with this pool of molecules and binds to mismatches. The error-enriched (MutS-bound) fraction is resolved from the error-depleted fraction by electrophoresis. Box 3: The error-depleted segments are assembled into the desired gene and amplified by PCR prior to cloning. (B) Polyacrylamide gel electrophoresis of DNA segments used to assemble the GFP gene construct (contrast enhanced). Lane 1: size standard (Kb DNA Ladder, Stratagene; from bottom, sizes are 250, 500, 750 and 1000 bp). Lanes 2–5: the four segments, each complexed with MutS. Lower bands are the error-depleted fractions; upper bands are the error-enriched (MutS-bound) fractions. Lanes 6–9: the same four segments, with no MutS present. Some smearing of the DNA is consistently observed between the two bands in all lanes containing MutS, probably representing protein–DNA complexes which have dissociated.
Figure 3
Figure 3
(A) Effect of error removal on GFP gene synthesis. Flow cytometry measurements of cells expressing GFP from synthetic genes. Error removal as shown in Figure 2 has been used to improve the quality of the synthesis products. Horizontal axes indicate fluorescence intensity specific to this gene, while vertical axes indicate non-specific fluorescence at a different frequency. Thus, cells which contain successfully synthesized GFP genes are expected to display a minimum level of fluorescence at 530 nm, and substantially less fluorescence at 585 nm (the bounded region in the lower right of each graph). Higher contours (lighter plot color) indicate greater density of cells at a given coordinate. Negative control: expressing a non-fluorescent gene (Tet) in the same vector; Error-enriched: GFP genes produced from MutS-bound DNA fragments; Standard: GFP genes produced by conventional gene synthesis, with no additional processing to remove errors; Error-depleted: GFP genes which have undergone one cycle of error removal; Depleted twice: after two cycles of error removal; Positive control: a correct copy of the same GFP gene, in the same vector. (B) Mean fluorescence intensity of each population of cells (50 000 per experiment) as a function of the proportion of fluorescent cells (those in the cut-off region indicated in panel A). Each application of the error-removal process yields an improvement in the quality of the synthetic genes. (black circles): negative control; (yellow triangles): error-enriched; (black square): standard; (red diamonds): ‘untreated’ DNA subjected to the same manipulations shown in Figure 2, but without the application of MutS protein; (blue circle): DNA error-depleted once using MutS protein; (black triangles): the same GFP DNA employed for the positive control, but amplified by PCR and re-cloned; (purple square): depleted twice; (black diamonds): positive control. Values have been normalized to the mean intensity of the positive control (set at 1). Color symbols indicate sets which were subjected to DNA sequencing and correspond to the symbols shown in Figure 4.
Figure 4
Figure 4
Positions of errors within the GFP DNA synthesis product. From bottom to top: the overlapping set of 38 oligonucleotides (thirty-six 50mers and two 5′-terminal 59mers) used to build the GFP gene and flanking sequences (arrowheads indicate the 3′-terminus of each molecule); the four intermediate assembly products used for the first round of error depletion; positions of errors present in the error-enriched (EE, yellow triangles), untreated (UN, red diamonds), error-depleted (ED1, blue circles), and twice depleted (ED2, purple squares) gene synthesis products. Per-base error rates for each of these sets are also indicated.

Similar articles

Cited by

References

    1. Khorana H.G. (1968) Nucleic acid synthesis in the study of the genetic code, in Nobel Lectures: Physiology or Medicine (1963–1970). Elsevier Science Ltd, Amsterdam, pp. 341–369.
    1. Agarwal K.L., Buchi,H., Caruthers,M.H., Gupta,N., Khorana,H.G., Kleppe,K., Kumar,A., Ohtsuka,E., Rajbhandary,U.L., Van de Sande,J.H. et al. (1974) Total synthesis of the gene for an alanine transfer ribonucleic acid from yeast. Nature, 227, 27–34. - PubMed
    1. Venter J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304–1351. - PubMed
    1. Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. - PubMed
    1. Kleppe K., Ohtsuka,E., Kleppe,R., Molineux,I. and Khorana,H.G. (1971) Studies on polynucleotides. XCVI. Repair replications of short synthetic DNA's as catalyzed by DNA polymerases. J. Mol. Biol., 56, 341–361. - PubMed

Publication types