Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 Aug 6;6 Suppl 4(Suppl 4):S467-76.
doi: 10.1098/rsif.2008.0520.focus. Epub 2009 Mar 11.

You're one in a googol: optimizing genes for protein expression

Affiliations
Review

You're one in a googol: optimizing genes for protein expression

Mark Welch et al. J R Soc Interface. .

Abstract

A vast number of different nucleic acid sequences can all be translated by the genetic code into the same amino acid sequence. These sequences are not all equally useful however; the exact sequence chosen can have profound effects on the expression of the encoded protein. Despite the importance of protein-coding sequences, there has been little systematic study to identify parameters that affect expression. This is probably because protein expression has largely been tackled on an ad hoc basis in many independent projects: once a sequence has been obtained that yields adequate expression for that project, there is little incentive to continue work on the problem. Synthetic biology may now provide the impetus to transform protein expression folklore into design principles, so that DNA sequences may easily be designed to express any protein in any system. In this review, we offer a brief survey of the literature, outline the major challenges in interpreting existing data and constructing robust design algorithms, and propose a way to proceed towards the goal of rational sequence engineering.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Factors influencing protein expression. Several factors that act along the path of expression from DNA to mRNA to protein are shown, any of which could be altered by or could affect the impact of gene design. RBS, ribosome-binding site.
Figure 2
Figure 2
Choosing an appropriate design algorithm. A simple example is shown of how two different algorithms for the same optimization problem are affected by sequence constraints. The coding sequence encodes five peptide segments of a protein, which may or may not be contiguous. The initial starting sequence is one possibility, chosen to match the target codon bias of the gene. The optimization constraints for both algorithms are that (i) no EcoRI is allowed, (ii) codon usage ratios for E (GAG/GAA) and F (TTC/TTT) must be equal to 1, and (iii) direct sequence repeats greater than seven nucleotides should be minimized. Iterations involve single codon replacements and a greedy search is followed. Thus, replacements are allowed only if improvement is achieved. At each step, no worsening of previously applied constraints is allowed. The algorithm in (a) begins by minimizing repeat elements and then tries to remove EcoRI sites without increasing the number of repeats. Since either possible substitution to remove the EcoRI site will add new repeats, no change is allowed and the algorithm fails to reach its goals. In (b), because the hard constraint of restriction site removal is applied first, the algorithm has two routes (red versus blue arrows) to successfully reach the goals.

References

    1. Akashi H. 2001. Gene expression and molecular evolution. Curr. Opin. Genet. Dev. 11, 660–666. (10.1016/S0959-437X(00)00250-1) - DOI - PubMed
    1. Akashi H., Gojobori T. 2002. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl Acad. Sci. USA. 99, 3695–3700. (10.1073/pnas.062526999) - DOI - PMC - PubMed
    1. Andrianantoandro E., Basu S., Karig D. K., Weiss R. 2006. Synthetic biology: new engineering rules for an emerging discipline. Mol. Syst. Biol. 2, 2006.0028 (10.1038/msb4100073) - DOI - PMC - PubMed
    1. Angov E., Hillier C. J., Kincaid R. L., Lyon J. A. 2008. Heterologous protein expression is enhanced by harmonizing the codon usage frequencies of the target gene with those of the expression host. PLoS ONE. 3, e2189 (10.1371/journal.pone.0002189) - DOI - PMC - PubMed
    1. Antezana M. A., Jordan I. K. 2008. Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes. PLoS ONE. 3, e2145 (10.1371/journal.pone.0002145) - DOI - PMC - PubMed

LinkOut - more resources