Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 24;24(1):34.
doi: 10.1186/s13059-023-02868-2.

Using protein-per-mRNA differences among human tissues in codon optimization

Affiliations

Using protein-per-mRNA differences among human tissues in codon optimization

Xavier Hernandez-Alias et al. Genome Biol. .

Abstract

Background: Codon usage and nucleotide composition of coding sequences have profound effects on protein expression. However, while it is recognized that different tissues have distinct tRNA profiles and codon usages in their transcriptomes, the effect of tissue-specific codon optimality on protein synthesis remains elusive.

Results: We leverage existing state-of-the-art transcriptomics and proteomics datasets from the GTEx project and the Human Protein Atlas to compute the protein-to-mRNA ratios of 36 human tissues. Using this as a proxy of translational efficiency, we build a machine learning model that identifies codons enriched or depleted in specific tissues. We detect two clusters of tissues with an opposite pattern of codon preferences. We then use these identified patterns for the development of CUSTOM, a codon optimizer algorithm which suggests a synonymous codon design in order to optimize protein production in a tissue-specific manner. In human cell-line models, we provide evidence that codon optimization should take into account particularities of the translational machinery of the tissues in which the target proteins are expressed and that our approach can design genes with tissue-optimized expression profiles.

Conclusions: We provide proof-of-concept evidence that codon preferences exist in tissue-specific protein synthesis and demonstrate its application to synthetic gene design. We show that CUSTOM can be of benefit in biological and biotechnological applications, such as in the design of tissue-targeted therapies and vaccines.

Keywords: Codon optimization; Gene design; Proteomics; Tissue; Transcriptomics; Translation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Protein-to-mRNA ratios detect differences in translational efficiency among tissues. A Proteomics and mRNA-seq data included in this study contains samples from the GTEx project [19] and Human Protein Atlas [20]. B Using these datasets, we compute the protein-to-mRNA ratios (PTR) and define tissue-enriched and tissue-depleted sets of proteins for each tissue. By comparing the codon usage of these two sets, we identify the codon optimality pattern of tissues. Using this information, we develop a gene design tool called CUSTOM and validate the method using an in vitro cellular model. C Spearman correlation between the median translational efficiency [21] (ratio between ribo-seq and mRNA-seq FPKMs) and PTR [20] across genes in the brain, liver, and testis. The color code depicts the density of points in the scatter plot
Fig. 2
Fig. 2
Random Forest models identify two clusters of human tissues with distinct codon signatures. A Receiver operating characteristic (ROC) curves of lung and kidney random forest classifiers, in which the codon usage of genes is used to predict whether they are high-PTR or low-PTR in the respective tissue (see the “Methods” section). B Ratios of the codon usage between high-PTR and low-PTR genes in each tissue. Codons and tissues are hierarchically clustered using Euclidean distances and the complete-linkage method. The barplot on the left shows the mean AUC of the ROC curve of the RF model of each tissue
Fig. 3
Fig. 3
CUSTOM generates fluorescent variants with desired tissue-specific expression. A Selected eGFP and mCherry sequences optimized to lung and kidney using CUSTOM. The color code corresponds to the optimality ratios of Fig. 2B. B Using these sequences, we designed four of constructs by placing a mCherry and an eGFP with opposite tissue-specificity under an inducible bidirectional promoter. C Ratios of eGFP and mCherry for each of the four constructs in A549 and HEK293T cell lines, detected by flow cytometry. Three biological replicates are downsampled to 1000 cells per group and summed; see individual replicates in Additional file 1: Fig. S7E. On the right, the distribution of all four constructs together are shown. D Conceptual summary of the relationship between the gene codon usage and their expression across tissues. E Ratios of eGFP and mCherry for each of the four constructs in primary cells, detected by flow cytometry. Top and bottom panels correspond to two independent batches of primary cells (see the “Methods” section). The number of cells within each group is specified. Center values represent the median. Statistical differences were determined by two-tailed Wilcoxon rank-sum test and are denoted as follows: *p ≤ 0.05, **p ≤ 0.01, ***p ≤ 0.001, ****p ≤ 0.0001

References

    1. Ranaghan MJ, Li JJ, Laprise DM, Garvie CW. Assessing optimal: inequalities in codon optimization algorithms. BMC Biol. 2021;19:36. doi: 10.1186/s12915-021-00968-8. - DOI - PMC - PubMed
    1. Bergman S, Tuller T. Widespread non-modular overlapping codes in the coding regions. Phys Biol. 2020;17:031002. doi: 10.1088/1478-3975/ab7083. - DOI - PubMed
    1. Watts A, Sankaranarayanan S, Watts A, Raipuria RK. Optimizing protein expression in heterologous system: strategies and tools. Meta Gene. 2021;29:100899. doi: 10.1016/j.mgene.2021.100899. - DOI
    1. Gould N, Hendy O, Papamichail D. Computational tools and algorithms for designing customized synthetic genes. Front Bioeng Biotechnol. 2014;2:41. doi: 10.3389/fbioe.2014.00041. - DOI - PMC - PubMed
    1. Tunney R, McGlincy NJ, Graham ME, Naddaf N, Pachter L, Lareau LF. Accurate design of translational output by a neural network model of ribosome distribution. Nat Struct Mol Biol. 2018;25:577–582. doi: 10.1038/s41594-018-0080-2. - DOI - PMC - PubMed

Publication types