Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 15:7:e32496.
doi: 10.7554/eLife.32496.

Codon usage bias controls mRNA and protein abundance in trypanosomatids

Affiliations

Codon usage bias controls mRNA and protein abundance in trypanosomatids

Laura Jeacock et al. Elife. .

Abstract

Protein abundance differs from a few to millions of copies per cell. Trypanosoma brucei presents an excellent model for studies on codon bias and differential gene expression because transcription is broadly unregulated and uniform across the genome. T. brucei is also a major human and animal protozoal pathogen. Here, an experimental assessment, using synthetic reporter genes, revealed that GC3 codons have a major positive impact on both mRNA and protein abundance. Our estimates of relative expression, based on coding sequences alone (codon usage and sequence length), are within 2-fold of the observed values for the majority of measured cellular mRNAs (n > 7000) and proteins (n > 2000). Our estimates also correspond with expression measures from published transcriptome and proteome datasets from other trypanosomatids. We conclude that codon usage is a key factor affecting global relative mRNA and protein expression in trypanosomatids and that relative abundance can be effectively estimated using only protein coding sequences.

Keywords: Leishmania; Trypanosoma; brucei; evolutionary biology; genomics; infectious disease; microbiology; post-transcription; synonymous; translation.

PubMed Disclaimer

Conflict of interest statement

LJ, JF, DH No competing interests declared

Figures

Figure 1.
Figure 1.. Protein expression is increased by GC3 codons in T.brucei.
(A) Schematic map of the pRPai-based, tetracycline-inducible reporter construct. Relevant restriction sites are shown. Black bars, tubulin untranslated regions; arrow, pol-I promoter; pA, polyadenylation site; SA, splice-acceptor site. The heat-maps of the wild-type and human codon optimised gLUC genes indicate level (percentage) of codon over-representation (green) and under-representation (red) in highly expressed genes. (B) Protein blot analysis of gLUC expression in T. brucei. *, cross-reactive band. The Coomassie-stained panel serves as a loading control; the strong band at approximately 55 kDa is the abundant Variant Surface Glycoprotein (VSG). The numbers indicate proportional luciferase expression, based on densitometry. Three independent clones gave similar results for each construct. (C) The heat-maps of synthetic gLUC reporter genes indicate codon usage as in A above. The plot indicates luciferase activity for each reporter in T. brucei; four readings from two independent strains. Error bars, standard deviation. *, p<0.0001; one-way ANOVA test. (D) The heat-maps of synthetic GFP reporter genes indicate codon usage as in A above. The LICOR protein blot indicates GFP expression for each reporter in T. brucei; β-tubulin serves as a loading control. Two independent clones gave similar results for each construct.
Figure 2.
Figure 2.. mRNA expression is increased by GC3 codons in T.brucei.
(A) Schematic map of the reporter cassette. The grey bar indicates the position of the tubulin untranslated region probe. The RNA blot indicates native tubulin transcripts and the gLUC transcripts. An ethidium bromide stained gel serves as an additional loading control. (B) Schematic of the reporter cassette incorporating a lambda 3'-untranslated segment. The grey bar indicates the position of the lambda untranslated region probe. The upper RNA blot shows gLUC and GFP transcripts. An ethidium bromide stained gel and a replicate blot probed for tubulin serve as loading controls. Pairs of independent strains were analysed for each reporter construct. (C) Phosphorimager-based quantification of reporter expression in B. Error bars, standard deviation from two independent strains. Values were corrected for loading (tubulin). *, p<0.02; one-way ANOVA test.
Figure 3.
Figure 3.. Genome scale analysis of codon usage bias.
(A) CAI value distribution is shown for all non-redundant T. brucei genes and the cohorts of genes indicated. See the text for more detail on each cohort. (B) CAI values are shown in heat-map format (deviation from average, Av) on physical maps of T. brucei chromosome 3 and L. major chromosome 1. Salient features are indicated. (C) Codon representation (third position difference), relative to the average usage across the genome, is shown within the cohorts of T. brucei genes indicated; protein kinase activity (GO:0004672), plasma membrane (GO:0005886), transcription (GO:0006350), translation (GO:0006412). The numbers above the heat-map indicate the number of redundant codons available in each case.
Figure 4.
Figure 4.. Genome scale analysis of codon pair bias in T.
brucei. (A) Codon co-occurrence by encoded amino acid. Amino acid pairs are over-represented; highlighted by white boxes. (B) Analysis of third position followed by first position pairs. Examples of over-represented pairs are shown in green and examples of under-represented pairs are shown in red. (C) Analysis of third position and third position pairs. Examples are shown as in B. Amino acids and codons on the vertical axis precede those on the horizontal axis.
Figure 5.
Figure 5.. Transcriptome and proteome data and the impact of gene length in T.brucei.
(A) Correspondence between observed mRNA and protein expression. (B) Relationship between observed mRNA expression and protein coding sequence (CDS) length. RPKM, Reads Per Kilobase of transcript per Million mapped reads. (C) Relationship between observed protein expression and protein coding sequence (CDS) length. Cohorts of particularly long (red, 13.4 ± 1 kbp, n = 11) and short (blue, 0.55 ± 0.22 kbp, n = 67) genes, encoding dynein heavy chains and ribosomal proteins, respectively, are highlighted. n = 2315 genes for panels A and C, n = 7225 genes for panel B.
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. RNA-seq data.
Replicate read counts and correspondence analysis. RPKM (Reads Per Kilobase of transcript per Million mapped reads).
Figure 6.
Figure 6.. Codon usage is predictive of relative mRNA and protein expression in T.brucei.
(A) Correspondence between relative observed mRNA expression and CAI. (B) Correspondence between relative observed mRNA levels and predicted expression based on CAI and CDS length in kbp (L); the inset shows the impact of length-correction on the correlation coefficient. RPKM, Reads Per Kilobase of transcript per Million mapped reads. (C) As in B but showing proportions of expression measures within 2- to 5-fold of the predictions; the formula for the exponential trend-line is indicated. (D) Correspondence between relative observed protein expression and CAI. (E) Correspondence between relative observed protein levels and predicted expression based on CAI and CDS length in kbp (L); inset as in B above. (F) As in E but showing proportions of expression measures within 2- to 5-fold of the predictions; the formula for the exponential trend-line is indicated. A-B, D-E; Cohorts of particularly long (red) and short (blue) genes (see Figure 5) are highlighted. n = 7225 genes for panels A-C, n = 2315 proteins for panels D-F.
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. Length-adjusted CAI is predictive of relative mRNA expression in previously published datasets.
Data from distinct life cycle stages of T. brucei and from different research groups were analysed; the data source is indicated in each case. Correspondence is shown between relative observed mRNA expression and our predictions based on CAI and CDS length in kbp (L). (A) Bloodstream-stage cells. Cohorts of bloodstream-upregulated (red) and insect-upregulated (blue) genes are highlighted (≥3 fold change between life cycle stages); the correlation coefficients do not take account of these ‘stage-specific’ genes n = 201. Total n = 7191 genes. (B) As in A but for insect-stage cells. (C). Insect-stage cells. n = 7307 genes. RPKM, Reads Per Kilobase of transcript per Million mapped reads.
Figure 7.
Figure 7.. Codon usage predicts the relative expression of protein complexes and cohorts of proteins with related functions in T.brucei.
Correspondence between observed peptide counts and predicted abundance based on CAI. The complexes and cohorts are listed in order of peptides/kbp and number of proteins is indicated for each; protein numbers are also reflected by the symbol sizes. The formula for the exponential trend-line is indicated. n = 23 cohorts, n = 277 proteins.
Figure 8.
Figure 8.. Length-adjusted CAI and CAI are predictive of translation efficiency and mRNA half-life, respectively, in previously published data from T.brucei; the data source is indicated in each case.
(A) Correspondence between translation efficiency (footprint levels/mRNA levels) and length-adjusted CAI. n = 4880 genes. Data from bloodstream-form cells is shown; correlation coefficient for insect-form cells was 0.36 (improved by 3.3% by the length-adjustment). (B) Correspondence between mRNA half-life and CAI. n = 6333 genes. Data from bloodstream-form cells is shown; correlation coefficient for insect-form cells was 0.42.
Figure 9.
Figure 9.. Length-adjusted CAI is predictive of relative mRNA and protein expression in previously published data from the other trypanosomatids, T.vivax and Leishmania mexicana; the data source is indicated in each case.
The plots indicate correspondence between relative observed mRNA or protein expression and our predictions based on CAI and CDS length in kbp (L). (A) T. vivax mRNA expression. n = 5170 genes. (B) T. vivax protein expression. n = 859 proteins. (C). Leishmania mexicana mRNA expression. n = 5715 genes. The insets show the impact of length-correction on the correlation coefficient. FPKM, Fragments Per Kilobase of transcript per Million mapped reads.

Comment in

References

    1. Akiyoshi B, Gull K. Discovery of unconventional kinetochores in kinetoplastids. Cell. 2014;156:1247–1258. doi: 10.1016/j.cell.2014.01.049. - DOI - PMC - PubMed
    1. Alsford S, Horn D. Single-locus targeting constructs for reliable regulated RNAi and transgene expression in Trypanosoma brucei. Molecular and Biochemical Parasitology. 2008;161:76–79. doi: 10.1016/j.molbiopara.2008.05.006. - DOI - PMC - PubMed
    1. Alsford S, Kawahara T, Glover L, Horn D. Tagging a T. brucei RRNA locus improves stable transfection efficiency and circumvents inducible expression position effects. Molecular and Biochemical Parasitology. 2005;144:142–148. doi: 10.1016/j.molbiopara.2005.08.009. - DOI - PMC - PubMed
    1. Alvarez F, Robello C, Vignali M. Evolution of codon usage and base contents in kinetoplastid protozoans. Molecular Biology and Evolution. 1994;11:790–802. doi: 10.1093/oxfordjournals.molbev.a040159. - DOI - PubMed
    1. Antwi EB, Haanstra JR, Ramasamy G, Jensen B, Droll D, Rojas F, Minia I, Terrao M, Mercé C, Matthews K, Myler PJ, Parsons M, Clayton C. Integrative analysis of the Trypanosoma brucei gene expression cascade predicts differential regulation of mRNA processing and unusual control of ribosomal protein expression. BMC Genomics. 2016;17:306. doi: 10.1186/s12864-016-2624-3. - DOI - PMC - PubMed

Publication types