Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Feb 4:2025.02.03.636209.
doi: 10.1101/2025.02.03.636209.

Genes and Pathways Comprising the Human and Mouse ORFeomes Display Distinct Codon Bias Signatures that Can Regulate Protein Levels

Affiliations

Genes and Pathways Comprising the Human and Mouse ORFeomes Display Distinct Codon Bias Signatures that Can Regulate Protein Levels

Evan T Davis et al. bioRxiv. .

Abstract

Arginine, glutamic acid and selenocysteine based codon bias has been shown to regulate the translation of specific mRNAs for proteins that participate in stress responses, cell cycle and transcriptional regulation. Defining codon-bias in gene networks has the potential to identify other pathways under translational control. Here we have used computational methods to analyze the ORFeome of all unique human (19,711) and mouse (22,138) open-reading frames (ORFs) to characterize codon-usage and codon-bias in genes and biological processes. We show that ORFeome-wide clustering of gene-specific codon frequency data can be used to identify ontology-enriched biological processes and gene networks, with developmental and immunological programs well represented for both humans and mice. We developed codon over-use ontology mapping and hierarchical clustering to identify multi-codon bias signatures in human and mouse genes linked to signaling, development, mitochondria and metabolism, among others. The most distinct multi-codon bias signatures were identified in human genes linked to skin development and RNA metabolism, and in mouse genes linked to olfactory transduction and ribosome, highlighting species-specific pathways potentially regulated by translation. Extreme codon bias was identified in genes that included transcription factors and histone variants. We show that re-engineering extreme usage of C- or U-ending codons for aspartic acid, asparagine, histidine and tyrosine in the transcription factors CEBPB and MIER1, respectively, significantly regulates protein levels. Our study highlights that multi-codon bias signatures can be linked to specific biological pathways and that extreme codon bias with regulatory potential exists in transcription factors for immune response and development.

Keywords: Codon bias; ORFeome; codon re-engineering; development; gene expression; queuosine; tRNA modification; transcription factors; translation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. ORFeome Wide Codon Metrics for Human Genes.
A. Gene-specific ICF formula, where Ni is the number of times codon i appears in the mRNA sequence. Σall codons for ANcodon is the sum of all occurrences of codons that encode that amino acid A. B. Gene-specific codon Z-score formula uses frequency (x), average frequency of the genome (μ), and standard deviation of the genome average (σ). C. Representative plot for human genes at each codon frequency (left) or Z-score (right) for AAA (top) and CTA (bottom). The number of genes over-using or under-using a codon at different Z-score thresholds in humans (D-E) and mice (F-G).
Figure 2.
Figure 2.. Gene-Specific Codon Frequency Maps for the Human ORFeome.
ICF data for 19,711 human genes was clustered into five groups (A-E). Sub-patterns of codon usage were analyzed for enriched gene ontology.
Figure 3.
Figure 3.. Gene-Specific Codon Frequency Maps for the Mouse ORFeome.
ICF data for 22,138 unique mouse genes. Eight general clusters were identified (A-H) representing different patterns of gene-specific codon usage. Enriched gene-ontologies for groups A, B, D and H were identified.
Figure 4.
Figure 4.. Specific codon biases are linked to distinct biological functions in humans.
(A) methodology to (1) link genes that over-use a codon with biological process and (2) identify biological process linked multiple codons. (B) Gene ontology enriched (FDR < 0.05, −log10 FDR-values > 1.3) in each list of codon-biased genes (Z => 2) was identified for 59 codons. Ontologies not found were assigned −log10 FDR -values = 0. Data was hierarchically clustered and visualized. Summarized ontologies are listed on the Y-Axis, with exact ontologies shown in supplementary figure S2. (C) Data from panel B was filtered to identify ontologies with at least 5 codon linked −log10 FDR-values > 2.
Figure 5.
Figure 5.. Specific codon biases are linked to distinct biological functions in mice.
Gene ontology annotations enriched (FDR < 0.05, −log10 FDR > 1.3) in each list of codon-biased genes (Z => 2) was identified for 59 codons. Data was compiled to identify ontologies similarly identified in multiple codons. General ontology categories are shown on far left in black font, with exact ontologies noted in red font.
Fig. 6.
Fig. 6.. There are two types of extremely biased human and mouse genes.
Heat maps detailing the codon frequencies of top 50 biased human (A) and mouse (B) genes. The black bar denotes clusters populated by genes that over-use synonymous codons mostly ending in G or C.
Figure. 7.
Figure. 7.. Extreme codon bias can be conserved and regulates protein levels of the CEBPB and MIER1 transcription factors.
Heat map-based comparison of gene-specific codon frequencies for human and mouse (A) CEBPB and (B) MIER1 genes. Codons marked with an asterisk are decoded by Q, with parenthesis indicating the number of times that codon is found in the native gene sequence. (C) Protein-simple based analysis of WT and re-engineered human CEBPB. (D) Codon details (upper) and Protein-simple based analysis (lower) of WT and re-engineered human MIER1 versions.

References

    1. Bahiri-Elitzur S., and Tuller T. (2021). Codon-based indices for modeling gene expression and transcript evolution. Comput Struct Biotechnol J 19, 2646–2663. 10.1016/j.csbj.2021.04.042. - DOI - PMC - PubMed
    1. Hernandez-Alias X., Benisty H., Schaefer M.H., and Serrano L. (2020). Translational efficiency across healthy and tumor tissues is proliferation-related. Mol Syst Biol 16, e9275. 10.15252/msb.20199275. - DOI - PMC - PubMed
    1. dos Reis M., Savva R., and Wernisch L. (2004). Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res 32, 5036–5044. 10.1093/nar/gkh834. - DOI - PMC - PubMed
    1. Sharp P.M., and Li W.H. (1987). The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15, 1281–1295. 10.1093/nar/15.3.1281. - DOI - PMC - PubMed
    1. Sharp P.M., and Li W.H. (1986). An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol 24, 28–38. 10.1007/BF02099948. - DOI - PubMed

Publication types

LinkOut - more resources