Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 1;34(12):3064-3080.
doi: 10.1093/molbev/msx223.

Adenine Enrichment at the Fourth CDS Residue in Bacterial Genes Is Consistent with Error Proofing for +1 Frameshifts

Affiliations

Adenine Enrichment at the Fourth CDS Residue in Bacterial Genes Is Consistent with Error Proofing for +1 Frameshifts

Liam Abrahams et al. Mol Biol Evol. .

Abstract

Beyond selection for optimal protein functioning, coding sequences (CDSs) are under selection at the RNA and DNA levels. Here, we identify a possible signature of "dual-coding," namely extensive adenine (A) enrichment at bacterial CDS fourth sites. In 99.07% of studied bacterial genomes, fourth site A use is greater than expected given genomic A-starting codon use. Arguing for nucleotide level selection, A-starting serine and arginine second codons are heavily utilized when compared with their non-A starting synonyms. Several models have the ability to explain some of this trend. In part, A-enrichment likely reduces 5' mRNA stability, promoting translation initiation. However T/U, which may also reduce stability, is avoided. Further, +1 frameshifts on the initiating ATG encode a stop codon (TGA) provided A is the fourth residue, acting either as a frameshift "catch and destroy" or a frameshift stop and adjust mechanism and hence implicated in translation initiation. Consistent with both, genomes lacking TGA stop codons exhibit weaker fourth site A-enrichment. Sequences lacking a Shine-Dalgarno sequence and those without upstream leader genes, that may be more error prone during initiation, have greater utilization of A, again suggesting a role in initiation. The frameshift correction model is consistent with the notion that many genomic features are error-mitigation factors and provides the first evidence for site-specific out of frame stop codon selection. We conjecture that the NTG universal start codon may have evolved as a consequence of TGA being a stop codon and the ability of NTGA to rapidly terminate or adjust a ribosome.

Keywords: error mitigation, dual coding; fourth site; frameshift; translation initiation.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Kernel density plots showing the proportion of coding sequences with each nucleotide (A, C, G, T) at coding sequence sites 4, 5, 6, and 7 (site 1 is defined as the first nucleotide of the start codon). Site 4 demonstrates a clear preference for A which is not observed at the other sites.
<sc>Fig</sc>. 2.
Fig. 2.
The proportion of coding sequences with fourth site A is maintained above the proportion of A-starting codons as GC content increases. The regression coefficient for all A-starting codons is significantly greater than for A-starting second codons (P = 7.056 × 10−19, Z = 8.874, two-tailed Z-test of equivalency), suggesting enrichment of A at the fourth site becomes stronger with increasing GC content.
<sc>Fig</sc>. 3.
Fig. 3.
The proportion of Shigella flexneri orthologs with a substitution of each nucleotide at the first position of codons from Escherichia coli. The proportion of sequences with a substitution from A at site 4 is displayed with the dotted line. Position 1 of the first codon demonstrates minimal variation away from an A-genotype confirming the preference for an ATG start codons. Substitutions from an A-genotype are reduced across the sites when compared with other nucleotides. The proportion of coding sequences with a change from A in codon 2 is significantly lower than neighboring codons (P < 0.001, one-sample T-test), suggesting fourth site A is under strong selection.
<sc>Fig</sc>. 4.
Fig. 4.
A schematic representation of the frameshift correction model. Both CDSs encode methionine followed by serine and have identical GC content. However, following a +1 frameshift sequence A encodes a cysteine followed by a leucine, whereas translation of sequence B is immediately terminated by the presence of an out of frame TGA stop codon.
<sc>Fig</sc>. 5.
Fig. 5.
Average of difference (AOD) scores for each amino acid, demonstrating enrichment or avoidance of each amino acid in the second peptide position when compared with amino acid use within the transcriptome. Genomes are grouped by GC content into three equal sizes grouping in order to minimize GC biases on amino acid choice (lysine for example, encoded by AAA and AAG, is expected to be used more frequently in GC-poor genomes). Amino acids encoded by two coding blocks are defined using the first nucleotide in the codon, for example, A-starting serine is denoted Sa. A preference for A-starting amino acids except methionine and isoleucine, regardless of genome GC content, is observed.
<sc>Fig</sc>. 6.
Fig. 6.
Comparisons between A enrichment ratios for synonymous and nonsynonymous sites in codons 2–4. Enrichment ratios compare the use of A at each site with at comparable positions for all codons in the transcriptome (i.e., site 4 is compared with the first positions of all codons, site 5 is compared with the second positions of all codons and site 6 is compared with the third positions of all codons). Unlike synonymous sites in neighboring codons that display similar A enrichment ratio distributions, we observer greater variation in A enrichment ratios for the fourth site in comparison with the more tightly controlled ratios for sites 7 and 10. Enrichment ratios at the fourth site are significantly increased when compared with sites 7 and 10.

References

    1. Agoglia RM, Fraser HB.. 2016. Disentangling sources of selection on exonic transcriptional enhancers. Mol Biol Evol. 332:585–590. - PMC - PubMed
    1. Al-Shahib A, Breitling R, Gilbert DR.. 2007. Predicting protein function by machine learning on amino acid sequences: a critical evaluation. BMC Genomics 81:1–10. - PMC - PubMed
    1. Archetti M. 2006. Genetic robustness and selection at the protein level for synonymous codons. J Evol Biol. 192:353–365. - PubMed
    1. Archetti M. 2004. Selection on codon usage for error minimization at the protein level. J Mol Evol. 593:400–415. - PubMed
    1. Asano K. 2014. Why is start codon selection so precise in eukaryotes?. Translation 21:e28387. - PMC - PubMed

LinkOut - more resources