Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 10;10(9):e0136778.
doi: 10.1371/journal.pone.0136778. eCollection 2015.

A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules

Affiliations

A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules

Blagovesta Popova et al. PLoS One. .

Abstract

A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Amino acid sequence of tHisF and randomized positions.
(A) Nucleotide sequence of synthetic thisF gene. The protein comprises 253 amino acid residues, corresponding to 759 nucleotide of its structural gene. His6-tag was fused C-terminally to the coding sequence (depicted in blue together with stop codon). Red font indicates randomized positions. (B) Three dimensional location of randomized chain positions. The residues selected for randomization are presented as surfaces and colored according to their distance from the top (C-terminal ends of the β-strands). Blue: I52, A54, S55, S144, I173, D176, G177, K179; green: D11, L50, G80, H84, N103, T104, A105, D130, T171, S201, A204, A224, S225: orange: A8, S101, A128, L222, A223.
Fig 2
Fig 2. Molar fractions of 19 codons in standard mixture and their derivation from observed occurrences on the surface of proteins and in catalytic sites of enzymes.
The frequency of the residues in the trinucleotide mixture is calculated from the surface residue frequencies (mesophiles) [38], multiplied with a factor of 0.75, plus catalytic residue frequencies [39], multiplied with a factor of 0.25. The frequency for cystein was set to zero and all other values were normalized to 1.
Fig 3
Fig 3. Modular division of thisF gene and hierarchical assembly scheme.
(A) The gene sequence was divided into 14 modules (C1-C14). The black bars represent modules with wild type sequence; the red bars represent randomized sequences. The modules were chemically synthesized and randomization, which was achieved by incorporating trinucleotide mixtures at the corresponding codon positions, indicated by asterisks. Module C5 was synthesized both as wild type and as randomized sequence. The gene library was generated by combinatorial assembly of smaller modules (fragment libraries) at two steps, schematically represented with arrows. (B) Enzymatic steps involved in generation of double-stranded fragments and step-wise gene assembly. All C-fragments were designed in such a way that recognition sequences of Type RII restriction enzymes are located outside the coding sequence and are removed in the act of cleavage, whereas cleavage points in both DNA strands are within coding sequence but outside randomized DNA sites. Also refer to Table 1.
Fig 4
Fig 4. Assembly of the gene library.
(A) Gel-eluted fragments from C-fragment libraries after restriction digestion (10% PAGE). The randomized fragments are marked in red; the brackets indicate the C-fragments, participating in one ligation assembly. Marker: 50 bp ladder (Thermo Scientific) (B) Ligation assembly of B1-fragment prior to gel elution. The full-length ligation product is indicated with an arrow. The length of the assembled B1 fragment with flanking BsaI restriction sites comprises 208 bp. Marker: 50 bp ladder (Thermo Scientific) (C) Gel-eluted fragments from B-fragment libraries. The length of the B1 fragment, liberated from the plasmid vector after restriction digestion comprises 182 bp. (D) Gene library ligation assembly. The full-length ligation product is indicated with an arrow. Marker: 1 kb ladder (Thermo Scientific).
Fig 5
Fig 5. Overview of MUSI software.
(A) Screen-shot of the MUSI software. (B) 10-step MUSI tutorial. For details, see Materials and methods and consult the manual (manual.txt), included in the download package.
Fig 6
Fig 6. Library analysis.
(A) Exchange rate per type of resident amino acid. Comparison of the expected exchange probabilities (the fraction of foreign trimers in each trinucleotide mixture) with the observed exchange rate per type of resident codon (cumulative, sample size: 2762 exchanges). The number of resident amino acids among the 26 selected positions is indicated in parentheses, below is the respective codon. The differences in the expected exchange probabilities per type of residue are due to the biased composition of the trimer mix. The thick horizontal line indicates the average exchange probability (0.28), the dashed line–the average observed exchange rate (0.27). (B) Pertaining to replacing residues. Comparison of the expected frequency distribution of the amino acid exchanges (the molar ratio of each trimer in the trinucleotide mixture) with the observed frequency distribution for a sample of 2762 trinucleotide exchanges. (C) Binomial distribution of the number of mutations in L24 and L26. The observed distribution of the number of mutations per molecule (mutant class) is compared with the expected one (number of representatives in each mutant class for the given sample size = occurrences). The expected distribution of the mutant classes (L24: k = 1–24; L26: k = 1–26) is calculated according to the binomial distribution for a sample size of 239 sequences (L24) and 76 sequences (L26).
Fig 7
Fig 7. Proteins produced from gene variants chosen at random from library L24.
Soluble crude protein extracts of E.coli DH5α after 4 h gene expression were analysed by Western blotting using anti-His-tag antibody. C- negative control: cells, transformed with empty vector; WT: wild type thisF (MWtHisF-His6 = 29 kDa); M–marker. Mutations in each clone are listed in S1 Table.
Fig 8
Fig 8. Summary workflow diagram of the method for combinatorial gene synthesis.
Library design: the method imposes no restrictions to the number of residues selected for mutagenesis or to their location. The composition of the trinucleotide mixture and the exchange rate can be freely chosen without any limitations. Contact the oligonucleotide synthesis company of your choice well in advance and coordinate the purchase of trimer phosphoramidites. Assembly design: The fractionation of the gene sequence into modules depends on the locations of the residues for randomization. Divide the gene sequence into modules with a length between 40 bp and 90 bp, containing the mutagenized codons. The fragment borders of the modules should create unique overhands for ligation assembly after restriction digestion (see Table 1 and main text for details). Use PCR for longer stretches of wild type gene sequence. Chemical synthesis: Invest efforts and resources into high quality oligonucleotide synthesis as well as cloning and analysis of the C-libraries. The chemically synthesized diversity is stored in them and they are the starting point for all future gene libraries. Clone a wild type sequence, corresponding to each mutagenized module. Library synthesis: Generate B-fragment libraries by ligation from C-fragment libraries. Different B-fragment libraries (1, …, n) can be obtained in parallel by exchanging or mixing mutagenized modules with the corresponding wild type modules. Archive the B-libraries. Generate gene libraries by ligation assembly of B-libraries. Use the combinatorics approach for generation of multiple unique gene libraries.

Similar articles

Cited by

References

    1. Wong TS, Roccatano D, Schwaneberg U. Steering directed protein evolution: strategies to manage combinatorial complexity of mutant libraries. Environ Microbiol. 2007; 9: 2645–2659. - PubMed
    1. Tee KL, Wong TS. Polishing the craft of genetic diversity creation in directed evolution. Biotechnol Adv. 2013; 31: 1707–1721. 10.1016/j.biotechadv.2013.08.021 - DOI - PubMed
    1. He M, Taussig MJ. Rapid discovery of protein interactions by cell-free protein technologies. Biochem Soc Trans. 2007; 35: 962–965. - PubMed
    1. Knappik A, Ge L, Honegger A, Pack P, Fischer M, Wellnhofer G, et al. Fully synthetic human combinatorial antibody libraries (HuCAL) based on modular consensus frameworks and CDRs randomized with trinucleotides. J Mol Biol. 2000; 296: 57–86. - PubMed
    1. Rauchenberger R, Borges E, Thomassen-Wolf E, Rom E, Adar R, Yaniv Y, et al. Human combinatorial Fab library yielding specific and functional antibodies against the human fibroblast growth factor receptor 3. J Biol Chem. 2003; 278: 38194–38205. - PubMed

Publication types

Substances