Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 10;11(8):e0154765.
doi: 10.1371/journal.pone.0154765. eCollection 2016.

Directed Chemical Evolution with an Outsized Genetic Code

Affiliations

Directed Chemical Evolution with an Outsized Genetic Code

Casey J Krusemark et al. PLoS One. .

Abstract

The first demonstration that macromolecules could be evolved in a test tube was reported twenty-five years ago. That breakthrough meant that billions of years of chance discovery and refinement could be compressed into a few weeks, and provided a powerful tool that now dominates all aspects of protein engineering. A challenge has been to extend this scientific advance into synthetic chemical space: to enable the directed evolution of abiotic molecules. The problem has been tackled in many ways. These include expanding the natural genetic code to include unnatural amino acids, engineering polyketide and polypeptide synthases to produce novel products, and tagging combinatorial chemistry libraries with DNA. Importantly, there is still no small-molecule analog of directed protein evolution, i.e. a substantiated approach for optimizing complex (≥ 10^9 diversity) populations of synthetic small molecules over successive generations. We present a key advance towards this goal: a tool for genetically-programmed synthesis of small-molecule libraries from large chemical alphabets. The approach accommodates alphabets that are one to two orders of magnitude larger than any in Nature, and facilitates evolution within the chemical spaces they create. This is critical for small molecules, which are built up from numerous and highly varied chemical fragments. We report a proof-of-concept chemical evolution experiment utilizing an outsized genetic code, and demonstrate that fitness traits can be passed from an initial small-molecule population through to the great-grandchildren of that population. The results establish the practical feasibility of engineering synthetic small molecules through accelerated evolution.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have read the journal's policy and the authors of this manuscript have the following competing interests: (1) Pehr Harbury is a founder and scientific advisory board member of DiCE LLC, a biotech company which is commercializing DNA programmed combinatorial chemistry; (2) Pat Brown is a founder and chief scientific officer of Impossible Foods, a biotech company which has licensed a DNA programmed combinatorial chemistry patent from Stanford University. Pehr Harbury's relationship with DiCE Molecules LLC and Pat Brown's relationship with Impossible Foods does not alter the authors’ adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1
A. Directed evolution of kinase substrates. An initial population of DNA genes was chemically translated into peptide-DNA conjugates using a DNA-programmed combinatorial library synthesis. The library of peptide-DNA conjugate was treated with protein kinase A, and phosphorylated conjugates were isolated. The genes associated with the phosphorylated conjugates were then amplified by the polymerase chain reaction, diversified by recombination, and used to program the synthesis of a subsequent library generation. After generations 2–4, the gene population was sequenced. Peptides encoded by enriched genes were synthesized individually without DNA, and tested for their ability to function as protein kinase A substrates. The gene amplification, diversification and DNA-programmed library synthesis steps (dashed arrows) were required to close the evolutionary cycle. DNA-tagged library techniques utilize a linear process, consisting of an unprogrammed synthesis of small molecule-DNA conjugates, a selection for function, and DNA sequencing (e.g. solid arrows). B. Selection for kinase substrates. The peptide-DNA conjugate library was incubated with protein kinase A and ATP-γ-S. The enzymatically treated library was then alkylated with biotin iodoacetamide, which coupled a biotin moiety to thiophosphorylated peptides. The biotinylated peptide-DNA conjugates were affinity purified on paramagnetic streptavidin beads. See S1 Fig for a quantification of phosphopeptide enrichment.
Fig 2
Fig 2. Gene structure.
The genes that programed the synthesis of specific tetrapeptides were made up of four amino-acid coding regions (VA-VD, rainbow bars). 384 distinct DNA codon sequences were present at each coding region. Unlike the natural genetic code, each coding region used a set of codon sequences that were distinct from the codon sequences at the adjacent coding regions. Consequently, a total of 1536 different codon sequences were present in the library. The different codons at each coding position directed the addition of one amino acid from a set of seventeen different Fmoc-protected amino acids. An arginine dimer was included as an 18th amino acid in the fourth and final synthetic step, so some of the products were pentapeptides. An extra bar code (VE, black/white bar) specified whether the gene product would be subjected to a kinase substrate selection or to a control selection. Each peptide was coupled through a 5' polyethylene glycol linker to the gene that programmed its synthesis.
Fig 3
Fig 3. DNA-programmed combinatorial library synthesis.
For each of four synthetic steps, the DNA genes were split into 384 sub-pools by hybridization of the codons in one of the coding regions to a spatially arrayed set of complementary oligonucleotides. The DNA genes were then transferred in a one-to-one fashion from the hybridization array into a 384-well filter plate loaded with DEAE-Sepharose resin. The DEAE resin acted as a solid support that retained the DNA genes during chemical reactions. One of seventeen different Fmoc-protected amino acids (dependent on the sub-pool position within the 384-well plate) was then coupled to the growing peptide chain linked to the DNA. After the chemical step, the genes were pooled, and the split-pool process was repeated until all of the coding regions had been chemically translated.
Fig 4
Fig 4. Population maturation.
A. The peptide-DNA conjugate library converged to PKA substrates over four generations. A histogram of the fold-enrichment ratios for the top 1000 genes in generations 2–4 is shown. Genes lacking a consensus motif are colored black, genes that encoded peptides with one of the two PKA consensus motifs (RR*[S/T]* or RRSF*) are colored silver, and genes that encoded the top RRSFL peptide are colored gold. See also S4 Fig. B. The DNA sequence of genes with synonymous codon substitutions influenced the enrichment of peptide-DNA conjugates. A histogram plot with blue bars shows the observed distribution of log fold-enrichment ratios for 830 different genes that encoded the same RRFSL peptide (95% are contained between 4.4 and 6). If all of the RRSFL-encoding genes had been equally enriched, the black distribution would have been expected (this computed distribution reflects Poisson noise from sparse gene sampling). The excess width of the observed distribution suggests the existence of a selection bias for or against different synonymous codons. The Poisson distribution was reduced to 0.63 of its full area for clarity of the plot.
Fig 5
Fig 5. Accuracy of hit detection.
The plots show the number of RRSFL encoding genes (y-axis) contained within the top N total genes (x-axis) of a list ranked by enrichment ratio. If the gene ranking had been perfect, the curves would have gone straight up the y-axis and then cut right on the x-axis at the top of the plot. The solid black line shows how RRSFL genes accumulate at a 90% false discovery rate (i.e. when every tenth gene is a hit). The y-value at the intersection of each curve with the solid black line corresponds to the number of RRSFL genes that would have been detected below a 90% false discovery threshold. A. Improved gene ranking over successive generations. The number of RRSFL genes at the top of ranked lists from the zeroth (yellow), second (red), third (green) and fourth (blue) generations is shown. None of the RRSFL genes could be detected below a 90% false discovery threshold in the zeroth or second generations, whereas 207 and 505 out of 1296 total could be detected in the third and fourth generations respectively. B. Dependence of gene ranking on sequencing depth. The effect of using increasingly small fractions of the total sequencing data to rank genes is shown. 505, 416, 319 and 188 of the RRSFL genes could be detected below a 90% false discovery threshold given 3 million, 1.5 million, 0.75 million and 0.3 million sequencing reads respectively. The discovered fraction of RRSFL genes grew roughly in proportion to the square root of the number of reads. C. Improved gene ranking with a redundant genetic code. The ranking of RRSFL gene sets based on 187500 gene reads and a two codon-per- amino acid genetic code is shown. In one case, the reads used for the analysis were restricted to genes containing a single codon from each codon pair. In this single-codon case, 32 of the 81 RRSFL genes sets could be detected below a 90% false discovery threshold. Alternatively, an identical number of gene reads were used for the analysis, but the reads included genes containing both codons of each codon pair. In the two-codon case, 58 of the 81 RRSFL genes sets could be detected. The two-codon genetic code revealed 70% of the RRSFL gene sets, while the one-codon code revealed only 40%.

References

    1. Smith GP (1985) Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228: 1315–1317. - PubMed
    1. Joyce GF (1989) Amplification, mutation and selection of catalytic RNA. Gene 82: 83–87. - PubMed
    1. Ellington AD, Szostak JW (1990) In vitro selection of RNA molecules that bind specific ligands. Nature 346: 818–822. - PubMed
    1. Tuerk C, Gold L (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249: 505–510. - PubMed
    1. Stemmer WP (1994) Rapid evolution of a protein in vitro by DNA shuffling. Nature 370: 389–391. - PubMed