Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 3:7:1180.
doi: 10.3389/fmicb.2016.01180. eCollection 2016.

Pangenome Evidence for Higher Codon Usage Bias and Stronger Translational Selection in Core Genes of Escherichia coli

Affiliations

Pangenome Evidence for Higher Codon Usage Bias and Stronger Translational Selection in Core Genes of Escherichia coli

Shixiang Sun et al. Front Microbiol. .

Abstract

Codon usage bias, as a combined interplay from mutation and selection, has been intensively studied in Escherichia coli. However, codon usage analysis in an E. coli pangenome remains unexplored and the relative importance of mutation and selection acting on core genes and strain-specific genes is unknown. Here we perform comprehensive codon usage analyses based on a collection of multiple complete genome sequences of E. coli. Our results show that core genes that are present in all strains have higher codon usage bias than strain-specific genes that are unique to single strains. We further explore the forces in influencing codon usage and investigate the difference of the major force between core and strain-specific genes. Our results demonstrate that although mutation may exert genome-wide influences on codon usage acting similarly in different gene sets, selection dominates as an important force to shape biased codon usage as genes are present in an increased number of strains. Together, our results provide important insights for better understanding genome plasticity and complexity as well as evolutionary mechanisms behind codon usage bias.

Keywords: codon usage bias; core genes; mutation; pangenome; strain-specific genes; translational selection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
E. coli pangenome and core genes based on 26 isolates. (A) Number of gene clusters shared in 1–26 isolates, respectively. According to their presence in different number of isolates, genes were further grouped into five gene sets: strain-specific genes, lowly-shared genes, moderately-shared genes, highly-shared genes, and core genes. (B) Pangenome size and core-genome size when the number of isolates varies from 1 to 26.
Figure 2
Figure 2
Distribution of GC contents (A) and gene length (B) across five gene sets in the E. coli pangenome. GC contents at three codon positions are denoted as GC1, GC2, GC3, respectively.
Figure 3
Figure 3
Codon usage bias in the E. coli pangenome estimated by four different measures, viz., (A) CDC, (B) CAI, (C) Nc, and (D) Nc. Correlation between CAI and gene expression level was examined in (E) strain-specific genes, (F) lowly-shared genes, (G) moderately-shared genes, (H) highly-shared genes, and (I) core genes, respectively.
Figure 4
Figure 4
Similarity between tRNA abundance and relative synonymous codon usage. The cosine similarity metric was used, indicating the degree of similarity between tRNA abundance and relative synonymous codon usage and ranging from 0 (completely different) to 1 (identical).
Figure 5
Figure 5
ENC-plot (A–E) and neutrality-plot (F–J) across different gene sets. The expected and estimated values in ENC-plot are indicated by solid and hollow circles, respectively. For any given gene, if its deviation value, calculated by (expected–estimated)/expected, is greater than a threshold (default = 0.15), this gene is assumed to deviate from the expected ENC curve. Similar results can be found in Table S7 when considering different thresholds.

Similar articles

Cited by

References

    1. Bentley S. (2009). Sequencing the species pan-genome. Nat. Rev. Microbiol. 7, 258–259. 10.1038/nrmicro2123 - DOI - PubMed
    1. Blattner F. R., Plunkett G., Bloch C. A., Perna N. T., Burland V., Riley M., et al. . (1997). The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462. - PubMed
    1. Bulmer M. (1991). The selection-mutation-drift theory of synonymous codon usage. Genetics 129, 897–907. - PMC - PubMed
    1. Chan P. P., Lowe T. M. (2016). GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 44, D184–D189. 10.1093/nar/gkv1309 - DOI - PMC - PubMed
    1. Chen S. L., Lee W., Hottes A. K., Shapiro L., McAdams H. H. (2004). Codon usage between genomes is constrained by genome-wide mutational processes. Proc. Natl. Acad. Sci. U.S.A. 101, 3480–3485. 10.1073/pnas.0307827100 - DOI - PMC - PubMed

LinkOut - more resources