. 2008 Jun;36(11):3819-27.

doi: 10.1093/nar/gkn288. Epub 2008 May 21.

SCUMBLE: a method for systematic and accurate detection of codon usage bias by maximum likelihood estimation

Morten Kloster¹, Chao Tang

Affiliations

PMID: 18495752
PMCID: PMC2441815
DOI: 10.1093/nar/gkn288

SCUMBLE: a method for systematic and accurate detection of codon usage bias by maximum likelihood estimation

Morten Kloster et al. Nucleic Acids Res. 2008 Jun.

. 2008 Jun;36(11):3819-27.

doi: 10.1093/nar/gkn288. Epub 2008 May 21.

Authors

Morten Kloster¹, Chao Tang

Affiliation

¹ Department of Bioengineering and Therapeutic Sciences, UCSF, San Francisco, California 94158, USA.

PMID: 18495752
PMCID: PMC2441815
DOI: 10.1093/nar/gkn288

Abstract

The genetic code is degenerate--most amino acids can be encoded by from two to as many as six different codons. The synonymous codons are not used with equal frequency: not only are some codons favored over others, but also their usage can vary significantly from species to species and between different genes in the same organism. Known causes of codon bias include differences in mutation rates as well as selection pressure related to the expression level of a gene, but the standard analysis methods can account for only a fraction of the observed codon usage variation. We here introduce an explicit model of codon usage bias, inspired by statistical physics. Combining this model with a maximum likelihood approach, we are able to clearly identify different sources of bias in various genomes. We have applied the algorithm to Saccharomyces cerevisiae as well as 325 prokaryote genomes, and in most cases our model explains essentially all observed variance.

PubMed Disclaimer

Figures

**Figure 1.**
(a) Cumulative histogram of the normalized variance for named genes in *S. cerevisiae* for models with various numbers of trends; actual genome (solid lines) compared to randomized genome (dotted lines). Models with 0 or 1 trend explain the data poorly, as the curve for the real genome is very different from that of a randomized genome, and there are many genes with very high normalized variance. (b) Average (black) and median (red) normalized variance for models with up to 10 trends.

**Figure 2.**
Experimental values for cellular mRNA/protein levels plotted against the first offset/CAI value of each gene for *S. cerevisiae*. Several groups of highly expressed genes are plotted in different colors.

**Figure 3.**
Median normalized variance for 325 prokaryote genomes, using models with 0–10 trends. The different genomes are slightly offset along the abscissa, in alphabetical order. The dotted brown line shows approximate median normalized variance for randomized genomes generated from the models (Supplementary Figure S7). Results for the average normalized variance are very similar, except that in rare but not exceptional cases, individual genes dominate the average due to extremely low estimated probabilities of using a specific codon which is, in fact, used.

**Figure 4.**
A four-trend model of *Helicobacter pylori*. (a)–(c) GC3 or GT3 plotted against the first three offsets. Genes for ribosomal proteins are circled in red. The cumulative distributionsof the offsets are shown above each graph, for all genes (black) and for ribosomal genes (red). (d) β₂ plotted against the number of the gene along the genome, with genes on different strands in different colors. The green and blue lines are 50-point running averages for strand 1 and 2, respectively.

**Figure 5.**
Scatter plot of the first two axes from the four-trend model found by SCUMBLE (a), WCA (b) and CA/RSCU (c) for the genes of *Anaeromyxobacter dehalogenans*. Genes for ribosomal proteins are circled in red. In (b) and (c), most genes are clustered near the origin; only a small fraction of the genes have significantly negative abscissae.

**Figure 6.**
Solid lines: number of prokaryote genomes (out of 325) for which the total fraction of the GC (a), GT (b), CT (c) or random (d) preference signal captured by the first n trends exceeds the abscissa, where n is given by the color. Total shaded area of each color is proportional to the average fraction of signal captured by the corresponding trend.

**Figure 7.**
Scatter plot of the first two offsets for the four-trend model of *B. subtilis*, with the genes' colors given by their cluster identity given in ref. (17).

See this image and copyright information in PMC

Cited by

Genome-wide patterns of codon bias are shaped by natural selection in the purple sea urchin, Strongylocentrotus purpuratus.
Kober KM, Pogson GH. Kober KM, et al. G3 (Bethesda). 2013 Jul 8;3(7):1069-83. doi: 10.1534/g3.113.005769. G3 (Bethesda). 2013. PMID: 23637123 Free PMC article.
Variations in Helicobacter pylori cytotoxin-associated genes and their influence in progression to gastric cancer: implications for prevention.
Rizzato C, Torres J, Plummer M, Muñoz N, Franceschi S, Camorlinga-Ponce M, Fuentes-Pananá EM, Canzian F, Kato I. Rizzato C, et al. PLoS One. 2012;7(1):e29605. doi: 10.1371/journal.pone.0029605. Epub 2012 Jan 3. PLoS One. 2012. PMID: 22235308 Free PMC article.
Genes optimized by evolution for accurate and fast translation encode in Archaea and Bacteria a broad and characteristic spectrum of protein functions.
von Mandach C, Merkl R. von Mandach C, et al. BMC Genomics. 2010 Nov 4;11:617. doi: 10.1186/1471-2164-11-617. BMC Genomics. 2010. PMID: 21050470 Free PMC article.
Characterizing the native codon usages of a genome: an axis projection approach.
Davis JJ, Olsen GJ. Davis JJ, et al. Mol Biol Evol. 2011 Jan;28(1):211-21. doi: 10.1093/molbev/msq185. Epub 2010 Aug 2. Mol Biol Evol. 2011. PMID: 20679093 Free PMC article.
Forces that influence the evolution of codon bias.
Sharp PM, Emery LR, Zeng K. Sharp PM, et al. Philos Trans R Soc Lond B Biol Sci. 2010 Apr 27;365(1544):1203-12. doi: 10.1098/rstb.2009.0305. Philos Trans R Soc Lond B Biol Sci. 2010. PMID: 20308095 Free PMC article. Review.

See all "Cited by" articles

References

1. Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J. Mol. Biol. 1981;146:1–21. - PubMed
1. Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 1981;151:389–409. - PubMed
1. Bennetzen JL, Hall BD. Codon selection in yeast. J. Biol. Chem. 1982;257:3026–3031. - PubMed
1. Ikemura T. Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in its protein genes: differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J. Mol. Biol. 1982;158:573–597. - PubMed
1. Bibb MJ, Findlay PR, Johnson MW. The relationship between base composition and codon usage in bacterial genes and its use for simple and reliable identification of protein-coding sequences. Gene. 1984;30:157–166. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SCUMBLE: a method for systematic and accurate detection of codon usage bias by maximum likelihood estimation

Affiliation

SCUMBLE: a method for systematic and accurate detection of codon usage bias by maximum likelihood estimation

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Molecular Biology Databases