Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan;28(1):211-21.
doi: 10.1093/molbev/msq185. Epub 2010 Aug 2.

Characterizing the native codon usages of a genome: an axis projection approach

Affiliations

Characterizing the native codon usages of a genome: an axis projection approach

James J Davis et al. Mol Biol Evol. 2011 Jan.

Abstract

Codon usage can provide insights into the nature of the genes in a genome. Genes that are "native" to a genome (have not been recently acquired by horizontal transfer) range in codon usage from a low-bias "typical" usage to a more biased "high-expression" usage characteristic of genes encoding abundant proteins. Genes that differ from these native codon usages are candidates for foreign genes that have been recently acquired by horizontal gene transfer. In this study, we present a method for characterizing the codon usages of native genes--both typical and highly expressed--within a genome. Each gene is evaluated relative to a half line (or axis) in a 59D space of codon usage. The axis begins at the modal codon usage, the usage that matches the largest number of genes in the genome, and it passes through a point representing the codon usage of a set of genes with expression-related bias. A gene whose codon usage matches (does not significantly differ from) a point on this axis is a candidate native gene, and the location of its projection onto the axis provides a general estimate of its expression level. A gene that differs significantly from all points on the axis is a candidate foreign gene. This automated approach offers significant improvements over existing methods. We illustrate this by analyzing the genomes of Pseudomonas aeruginosa PAO1 and Bacillus anthracis A0248, which can be difficult to analyze with commonly used methods due to their biased base compositions. Finally, we use this approach to measure the proportion of candidate foreign genes in 923 bacterial and archaeal genomes. The organisms with the most homogeneous genomes (containing the fewest candidate foreign genes) are mostly endosymbionts and parasites, though with exceptions that include Pelagibacter ubique and Beutenbergia cavernae. The organisms with the most heterogeneous genomes (containing the most candidate foreign genes) include members of the genera Bacteroides, Corynebacterium, Desulfotalea, Neisseria, Xylella, and Thermobaculum.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.
FIG. 1.
FCA plot of E. coli K-12. Each plot point shows the location of a gene in the first two axes of the analysis. Genes are colored according to their axis position (x value) based upon the colors of the visible spectrum, with red genes indicating the highest expression-related codon usage bias and violet genes indicating the least. Genes that differ significantly from all points on the native codon usage axis (likely to be foreign) are colored gray and are drawn behind the colored genes. Each gene’s position along the first axis of the plot also corresponds with its G + C content (from left to right: high G + C to low G + C) (see also Médigue et al. 1991).
F<sc>IG</sc>. 2.
FIG. 2.
FCA plot of P. aeruginosa PAO1. (A) Genes that are orthologous to those in E. coli K-12 are colored based upon E. coli axis position (x value) from figure 1. The nonorthologous genes are colored gray. (B) Genes are colored according to P. aeruginosa axis position (x value) based on the colors of the visible spectrum, with red genes having the highest expression-related codon usage bias and violet genes having the least. Genes that differ significantly from all points on the native codon usage axis (likely to be foreign) are colored gray. In both panels, gray genes are drawn behind the colored genes. Genes in the right portion of the first axis have low G + C contents (see also Grocock and Sharp 2002).
F<sc>IG</sc>. 3.
FIG. 3.
FCA plot of B. anthracis A0248. (A) Genes that are orthologous to those in E. coli K-12 are colored based upon E. coli axis position (x value) from figure 1. (B) Genes are colored based upon B. anthracis axis position (x value) based on the colors of the visible spectrum, with red genes having the highest expression-related codon usage bias and violet genes having the least. Genes that differ significantly from all points on the native codon usage axis (likely to be foreign) are colored gray. In both panels, gray genes are drawn behind the colored genes. Each gene’s position along the first axis of the plot also roughly corresponds with its G + C content (from left to right: low G + C to high G + C).

Similar articles

Cited by

References

    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Andersson SG, Sharp PM. Codon usage and base composition in Rickettsia prowazekii. J Mol Evol. 1996;42:525–536. - PubMed
    1. Badger JH. Exploration of microbial genomic sequences via comparative analysis [PhD dissertation] 1999 [Urbana (IL)]: University of Illinois at Urbana-Champaign. p. 45–92.
    1. Banerjee T, Ghosh TC. Gene expression level shapes the amino acid usages in Prochlorococcus marinus MED4. J Biomol Struct Dyn. 2006;23:547–553. - PubMed
    1. Bennetzen JL, Hall BD. Codon selection in yeast. J Biol Chem. 1982;257:3026–3031. - PubMed

Publication types

LinkOut - more resources