Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 7;8(10):e76177.
doi: 10.1371/journal.pone.0076177. eCollection 2013.

scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes

Affiliations

scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes

Patrick K O'Neill et al. PLoS One. .

Abstract

Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Benchmarking of codon usage bias indices with expression data.
Spearman correlation of scCAI, scRCA, MILC, δ and CDC indices with expression data for different bacterial species as a function of the global %GC content of each species genome. Supporting data for this figure (number of replicates for expression values, number of annotated ribosomal proteins, etc.) is provided in Table S2.
Figure 2
Figure 2. Codon and tRNA frequency distribution among six-box amino acids.
Average amino acid-normalized frequencies of codons in the reference set, of codons in all protein-coding genes and of gene copy number for their cognate tRNAs. For each codon, the three leftmost series correspond to values for Pseudomonas species and the three rightmost to average values for Psychrobacter species. The amino acid is displayed on the top right. Vertical bars indicate the standard error of the mean.
Figure 3
Figure 3. Average codon and tRNA frequency distribution for two-box amino acids.
Average two-box codon-normalized frequencies of different ending codons in the reference set and in all protein-coding genes, and of gene copy number for the different ending cognate tRNAs. For each codon, the three leftmost series correspond to values for Pseudomonas species and the three rightmost to average values for Psychrobacter species. Different codon endings are denoted by the corresponding IUB representation. The respective amino acids are displayed on the bottom right. Vertical bars indicate the standard error of the mean.
Figure 4
Figure 4. Average codon and tRNA frequency distribution for four-box amino acids.
Average four-box codon-normalized frequencies of different ending codons in the reference set and in all protein-coding genes, and of gene copy number for the different ending cognate tRNAs. For each codon, the three leftmost series correspond to values for Pseudomonas species and the three rightmost to average values for Psychrobacter species. Different codon endings are denoted by the corresponding IUB representation. The respective amino acids are displayed on the bottom right. Vertical bars indicate the standard error of the mean.
Figure 5
Figure 5. Correlation in scnRCA scores between the Pseudomonas and the Psychrobacter.
Plot of average scnRCA scores for each of the 791 identified conserved homologs between Pseudomonas and Psychrobacter species. For each axis, scnRCA values correspond to the average among all species with available complete genome sequences in the represented genus. Genes corresponding to ribosomal proteins and replication-associated proteins (e.g. DNA polymerase), as determined by annotation tags, are identified with different markers. Ribosomal proteins were defined as those having the term “ribosomal protein” in its GenBank annotation. Replication-associated proteins were defined as those having any of the following terms in their annotation: “chromosome replication”, “chromosome segregation”, “DNA gyrase”, “DNA polymerase” and “DNA topoisomerase”.
Figure 6
Figure 6. Species pair-wise correlation in scnRCA values vs. pair-wise correlation in genome codon frequency.
Plot of pair-wise Spearman correlations coefficients among all species for both scnRCA values in orthologous genes (Y-axis) and genome codon frequencies (X-axis). Correlation of scnRCA values for each pair of species was computed on all available orthologs (i.e. as in Figure 5). For each codon and genome, the genome codon frequency is defined as the codon frequency over all protein-coding genes in each genome. Correlation in genome codon frequencies was computed on the 61-component genome codon frequency vector of each species. Several group-specific pairings (e.g. Firmicutes vs. Pseudomonas) are highlighted using different markers.
Figure 7
Figure 7. Sliding-window analysis of inter-species pair-wise correlation.
Plot of pair-wise Spearman correlations coefficients (blue) among all species for scnRCA values (Y-axis) in orthologous genes as a function of the position of the center of a sliding window (X-axis) spanning half the total number of pair-wise conserved homologs, sorted by scnRCA value. The leftmost point on the X-axis corresponds to the window encompassing the lowest half of the scnRCA values among the orthologs between any two given species. The rightmost point on the X-axis corresponds to the window encompassing the highest half of the scnRCA values among the orthologs between any two given species. Correlation of scnRCA values for each pair of species was computed on all available orthologs between both species. Window positions have been normalized to the total number of conserved homologs in each pair of species to allow consistent overlaying. The p-values associated with each Spearman correlation are reported in Table S5. For each set of pair-wise homologs a randomized control of equal sample size is also shown (grey). The difference between the observed and control distributions of the Spearman ρ statistic are statistically significant across the whole range of scnRCA values. The results of Wilcoxon signed-rank tests against the paired randomized controls are reported in Table S7.

References

    1. Kurland CG (1991) Codon bias and gene expression. Febs Lett 285: 165–169. - PubMed
    1. Ermolaeva MD (2001) Synonymous codon usage in bacteria. Current issues in molecular biology 3: 91–97. - PubMed
    1. Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH (2004) Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S 101: 3480–3485. - PMC - PubMed
    1. Lobry JR, Sueoka N (2002) Asymmetric directional mutation pressures in bacteria. Genome biology 3 RESEARCH0058. - PMC - PubMed
    1. Rocha EP (2004) The replication-related organization of bacterial genomes. microbiology 150: 1609–1627. - PubMed

Publication types