. 2010 Jun 24;6(6):e1001004.

doi: 10.1371/journal.pgen.1001004.

Translational selection is ubiquitous in prokaryotes

Fran Supek¹, Nives Skunca, Jelena Repar, Kristian Vlahovicek, Tomislav Smuc

Affiliations

PMID: 20585573
PMCID: PMC2891978
DOI: 10.1371/journal.pgen.1001004

Translational selection is ubiquitous in prokaryotes

Fran Supek et al. PLoS Genet. 2010.

. 2010 Jun 24;6(6):e1001004.

doi: 10.1371/journal.pgen.1001004.

Authors

Fran Supek¹, Nives Skunca, Jelena Repar, Kristian Vlahovicek, Tomislav Smuc

Affiliation

¹ Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia.

PMID: 20585573
PMCID: PMC2891978
DOI: 10.1371/journal.pgen.1001004

Abstract

Codon usage bias in prokaryotic genomes is largely a consequence of background substitution patterns in DNA, but highly expressed genes may show a preference towards codons that enable more efficient and/or accurate translation. We introduce a novel approach based on supervised machine learning that detects effects of translational selection on genes, while controlling for local variation in nucleotide substitution patterns represented as sequence composition of intergenic DNA. A cornerstone of our method is a Random Forest classifier that outperformed previous distance measure-based approaches, such as the codon adaptation index, in the task of discerning the (highly expressed) ribosomal protein genes by their codon frequencies. Unlike previous reports, we show evidence that translational selection in prokaryotes is practically universal: in 460 of 461 examined microbial genomes, we find that a subset of genes shows a higher codon usage similarity to the ribosomal proteins than would be expected from the local sequence composition. These genes constitute a substantial part of the genome--between 5% and 33%, depending on genome size--while also exhibiting higher experimentally measured mRNA abundances and tending toward codons that match tRNA anticodons by canonical base pairing. Certain gene functional categories are generally enriched with, or depleted of codon-optimized genes, the trends of enrichment/depletion being conserved between Archaea and Bacteria. Prominent exceptions from these trends might indicate genes with alternative physiological roles; we speculate on specific examples related to detoxication of oxygen radicals and ammonia and to possible misannotations of asparaginyl-tRNA synthetases. Since the presence of codon optimizations on genes is a valid proxy for expression levels in fully sequenced genomes, we provide an example of an "adaptome" by highlighting gene functions with expression levels elevated specifically in thermophilic Bacteria and Archaea.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Comparison of methods for codon usage analysis.**
*Top left and right.* Performance of different classifiers utilizing codon frequencies in discriminating ribosomal protein genes from the rest of representative organism's protein genes. The receiver operating characteristic (ROC) curves show performance of: the Random Forest (RF) classifier , and the nearest centroid classifiers built around three distance measures of codon usage: CB, codon bias , CAI, codon adaptation index , and MILC, measure independent of length and composition . *Bottom left.* Number of genomes (out of 461) where the column method outperforms the row method based on the area-under-ROC (AUC) statistic, and the rank correlation of the classifiers' per-gene class probabilities with experimental measurements of *E. coli* cytoplasmic protein abundances. All results were obtained in 4-fold crossvalidation. *Bottom right*. Dependence of AUC_CAI and AUC_RF on genomic G+C content; AUC_CAI is decreased in genomes with imbalanced G+C.

**Figure 2. Predictive performance of the Random Forest classifier between datasets with and without codon frequencies.**
Performance is measured for the task of discriminating ribosomal protein genes from the rest of the protein coding genes, where each point represents a single run of four-fold crossvalidation. Points above the diagonal line signify improvement in AUC score with addition of codon frequencies, indicating that ribosomal protein genes have a characteristic pattern of codon usage which cannot be derived from the composition of intergenic DNA, a representation of the local nucleotide substitution patterns. The eleven genomes shown were cited as exhibiting no translational selection by each of the three previous multi-genome studies –, see Text S1, Appendix B. Figure S2 shows the same experiments, but with codon frequencies shuffled between genes.

**Figure 3. Extent of translational selection within genomes.**
(A) shows correlation of extent of translational selection in a genome (% OCU) to genome size, with the regression curve representing a fitted power-law relationship shown for illustrative purposes only. Genome size is expressed as number of protein coding genes at least 80 codons long. (B) shows the relationship between the genome size and “protein metabolism” and “regulation of biological process” functional categories, which is of predictable character; curves representing moving averages of the real data. (C) depicts correlation of % OCU to proportion of genes within a genome that belong to one of the two selected Gene Ontology categories from (B). “r_SVM” referred to in (C) is the Pearson's correlation coefficient of a non-linear Support Vector Machines (SVM) regression fit (crossvalidation) of % OCU, for different combination of variables; values of “r_SVM” obtained using one of the variables are given alongside the corresponding axis, top right inside the plot are values obtained when using both variables and in combination with the genome size.

**Figure 4. Expression levels of OCU versus non-OCU genes.**
Histograms comparing microarray signal intensities between genes with optimized codon usage (OCU) and the non-OCU genes. The *P. aeruginosa* and *S. coelicolor* genomes were previously considered to lack translational selection (Text S1, Appendix B). The p-values are by the Baumgartner-Weiss-Schindler permutation test . Block arrows show the mean microarray signal intensity of OCU or non-OCU genes. Numbers above the curly braces are ratios of mean signal intensity of OCU genes to mean signal intensity of non-OCU genes. Diamonds show the mean signal intensity for aminoacyl-tRNA synthetases (“t”) or the ribosomal protein genes (“R”). Full data for 19 organisms in Table S5; average ratio of OCU expression to non-OCU expression in the 19 organisms is 2.4x. See Figure S4 for similar histograms, but with the ribosomal protein genes excluded.

**Figure 5. Preferred codons in OCU genes.**
Height of bar segments indicates the number of genomes in which the putatively translationally optimal or suboptimal codon is more frequent in the OCU genes vs. the non-OCU genes, broken down by amino acid. An optimal codon may be determined for a two-fold degenerate amino acid in cases when a genome codes only for tRNAs with one specific anticodon. The codon that directly matches this anticodon is then declared to be putatively optimal and is almost always C- or A-ending; the other codon is putatively suboptimal. Preference for a codon is determined by a Mann-Whitney U test on OCU vs. non-OCU codon frequencies at p<10⁻³. Shown p values are by sign test under the null hypothesis that OCU genes are equally likely to prefer optimal or suboptimal codons.

**Figure 6. Gene ontology categories enriched with, or depleted of, OCU genes in Bacteria.**
Disc color indicates depletion (red) or enrichment (green), while size is proportional to log number of genes in category. Enrichment or depletion is significant at p<10⁻¹⁵ (Fisher's exact test) in all displayed categories. Thickness of grey lines represent semantic similarity between categories; also, spatial arrangement of discs approximately reflects a grouping of categories by semantic similarity. Displayed categories have been selected from a broader set to eliminate redundancy and prepared for visualization using the REViGO tool available at http://revigo.irb.hr/; see Dataset S2 for an exhaustive listing. Callout shows enrichment of selected orthologous groups within the “nucleosome assembly” category. Summary of results from Archaea is shown in the embedded frame.

See this image and copyright information in PMC

Cited by

Metabolic Specialization and Codon Preference of Lignocellulolytic Genes in the White Rot Basidiomycete Ceriporiopsis subvermispora.
Gonzalez A, Corsini G, Lobos S, Seelenfreund D, Tello M. Gonzalez A, et al. Genes (Basel). 2020 Oct 20;11(10):1227. doi: 10.3390/genes11101227. Genes (Basel). 2020. PMID: 33092062 Free PMC article.
Characterizing the mutational landscape of MM and its precursor MGUS.
Farswan A, Gupta A, Jena L, Ruhela V, Kaur G, Gupta R. Farswan A, et al. Am J Cancer Res. 2022 Apr 15;12(4):1919-1933. eCollection 2022. Am J Cancer Res. 2022. PMID: 35530275 Free PMC article.
Genes optimized by evolution for accurate and fast translation encode in Archaea and Bacteria a broad and characteristic spectrum of protein functions.
von Mandach C, Merkl R. von Mandach C, et al. BMC Genomics. 2010 Nov 4;11:617. doi: 10.1186/1471-2164-11-617. BMC Genomics. 2010. PMID: 21050470 Free PMC article.
Variation in global codon usage bias among prokaryotic organisms is associated with their lifestyles.
Botzman M, Margalit H. Botzman M, et al. Genome Biol. 2011 Oct 27;12(10):R109. doi: 10.1186/gb-2011-12-10-r109. Genome Biol. 2011. PMID: 22032172 Free PMC article.
The evolutionary signal in metagenome phyletic profiles predicts many gene functions.
Vidulin V, Šmuc T, Džeroski S, Supek F. Vidulin V, et al. Microbiome. 2018 Jul 10;6(1):129. doi: 10.1186/s40168-018-0506-4. Microbiome. 2018. PMID: 29991352 Free PMC article.

See all "Cited by" articles

References

1. Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A. 2004;101:3480–3485. - PMC - PubMed
1. Knight RD, Freeland SJ, Landweber LF. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2001;2:RESEARCH0010. - PMC - PubMed
1. Daubin V, Perriere G. G+C3 structuring along the genome: a common feature in prokaryotes. Mol Biol Evol. 2003;20:471–483. - PubMed
1. Lobry JR, Sueoka N. Asymmetric directional mutation pressures in bacteria. Genome Biol. 2002;3:RESEARCH0058. - PMC - PubMed
1. Rocha EP, Danchin A. Base composition bias might result from competition for metabolic resources. Trends Genet. 2002;18:291–294. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Translational selection is ubiquitous in prokaryotes

Affiliation

Translational selection is ubiquitous in prokaryotes

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources