Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Apr 22:19:2646-2663.
doi: 10.1016/j.csbj.2021.04.042. eCollection 2021.

Codon-based indices for modeling gene expression and transcript evolution

Affiliations
Review

Codon-based indices for modeling gene expression and transcript evolution

Shir Bahiri-Elitzur et al. Comput Struct Biotechnol J. .

Abstract

Codon usage bias (CUB) refers to the phenomena that synonymous codons are used in different frequencies in most genes and organisms. The general assumption is that codon biases reflect a balance between mutational biases and natural selection. Today we understand that the codon content is related and can affect all gene expression steps. Starting from the 1980s, codon-based indices have been used for answering different questions in all biomedical fields, including systems biology, agriculture, medicine, and biotechnology. In general, codon usage bias indices weigh each codon or a small set of codons to estimate the fitting of a certain coding sequence to a certain phenomenon (e.g., bias in codons, adaptation to the tRNA pool, frequencies of certain codons, transcription elongation speed, etc.) and are usually easy to implement. Today there are dozens of such indices; thus, this paper aims to review and compare the different codon usage bias indices, their applications, and advantages. In addition, we perform analysis that demonstrates that most indices tend to correlate even though they aim to capture different aspects. Due to the centrality of codon usage bias on different gene expression steps, it is important to keep developing new indices that can capture additional aspects that are not modeled with the current indices.

Keywords: Codon usage bias; Gene expression; Transcript evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Codon usage of all 64 codons in the different analyzed organisms, organelles, and viruses. The codon usage was calculated using the ratio between the codon's appearances and the relevant synonymous codons’ total appearances. The color bar represents the frequency of each codon (more details in the case study section). As can be seen, there are large differences in codon usage among the analyzed genomes. Equal frequencies can also be seen for codons that code for amino acids with one synonymous codon.
Fig. 2
Fig. 2
Different types of CUB indices examined in the paper. A.Indices that are based on the non-uniform usage of synonymous codon.B.Indices based on codon frequency in a reference set of genes. To deal with alternative splicing when using such indices, the longest isoform of the gene is usually considered.C.Indices that are based on the adaptation to the tRNA levels, and their supply.D.Indices that consider complex patterns of codons that affect translation, transcription, and mRNA stability.E.Indices that are based on a direct experimental procedure such as ribosome profiling.
Fig. 3
Fig. 3
A. Ribosome profiling procedure. Translation of mRNAs by ribosomes is arrested, then exposed mRNA is digested. Protected mRNA footprints are then sequenced and mapped to the genome, creating for each gene its read count profile. B. NET-seq procedure. A culture is flash frozen and cryogenically lysed. Nascent RNA is co-purified via immunoprecipitation (IP) of the RNAPII elongation complex. Conversion of RNA into DNA results in a DNA library with the RNA as an insert between DNA sequencing linkers. The sequencing primer is positioned such that the 3′ end of the insert is sequenced. m7G refers to the 7-methylguanosine cap structure at the 5′ end of nascent transcripts.
Fig. 4
Fig. 4
A. Different indices of CUB Spearman correlation with PA in S.cerevisiae. The indices are clustered according to types. ENC (effective number of codons), Fop (frequency of optimal codons), CAI (codon adaptation index), CBI (codon bias index), CEC (Codon-enrichment correlation), tAI (tRNA adaptation index), nTE (normalized translational efficiency), Chimera ARS, CPS (codon pair score), MTDR (mean typical decoding rate). All of the correlations between the CUB measure and PA are significant and in the right/expected direction. B. Spearman correlation between the different CUB indices in S.cerevisiae. It can be seen that typically indices from the same type correlate better. C. Different indices of CUB Spearman correlation with PA in E.coli. The indices are clustered according to types. ENC (effective number of codons), Fop (frequency of optimal codons), CAI (codon adaptation index), CBI (codon bias index), tAI (tRNA adaptation index), nTE (normalized translational efficiency), Chimera ARS, CPS (codon pair score), MTDR (mean typical decoding rate). All of the correlations between the CUB measure and PA are significant and in the right/expected direction. D. Spearman correlation between the different CUB indices in E.coli. It can be seen that typically indices from the same type correlate better. E. Different indices of CUB Spearman correlation with PA in Human. The indices are clustered according to types. ENC (effective number of codons), Fop (frequency of optimal codons), CAI (codon adaptation index), CBI (codon bias index), tAI (tRNA adaptation index), nTE (normalized translational efficiency), Chimera ARS, CPS (codon pair score). All of the correlations between the CUB measure and PA are significant and in the right/expected direction. F. Spearman correlation between the different CUB indices in Human. It can be seen that typically indices from the same type correlate better.
Fig. 5
Fig. 5
A. Dot plot of the lowest correlating indices FOP vs. CPS in S.cerevisiae.B. Dot plot of the highest correlating indices FOP vs. CBI in S.cerevisiae.C. Dot plot of the lowest correlating indices MTDR vs. CPS in E.coli.D. Dot plot of the highest correlating indices FOP vs. CBI in E.coli.E. Dot plot of the lowest correlating indices nTE vs. CAI in Human.B. Dot plot of the highest correlating indices FOP vs. CBI in Human.

References

    1. Sharp P.M., Li W.H. An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol. 1986;24:28–38. - PubMed
    1. Shah P., Gilchrist M.A. Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift. Proc Natl Acad Sci USA. 2011;108:10231–10236. - PMC - PubMed
    1. Akashi, H. Codon bias evolution in Drosophila. Population genetics of mutation-selection drift. (1997). - PubMed
    1. Lin K., Tan S.B., Kolatkar P.R., Epstein R.J. Nonrandom intragenic variations in patterns of codon bias implicate a sequential interplay between transitional genetic drift and functional amino acid selection. J Mol Evol. 2003;57:538–545. - PubMed
    1. Bergman S., Tuller T. Phys; Biol: 2020. Widespread non-modular overlapping codes in the coding regions. - PubMed