Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 13;8(1):10618.
doi: 10.1038/s41598-018-29035-z.

Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli

Affiliations

Codon usage clusters correlation: towards protein solubility prediction in heterologous expression systems in E. coli

Leonardo Pellizza et al. Sci Rep. .

Abstract

Production of soluble recombinant proteins is crucial to the development of industry and basic research. However, the aggregation due to the incorrect folding of the nascent polypeptides is still a mayor bottleneck. Understanding the factors governing protein solubility is important to grasp the underlying mechanisms and improve the design of recombinant proteins. Here we show a quantitative study of the expression and solubility of a set of proteins from Bizionia argentinensis. Through the analysis of different features known to modulate protein production, we defined two parameters based on the %MinMax algorithm to compare codon usage clusters between the host and the target genes. We demonstrate that the absolute difference between all %MinMax frequencies of the host and the target gene is significantly negatively correlated with protein expression levels. But most importantly, a strong positive correlation between solubility and the degree of conservation of codons usage clusters is observed for two independent datasets. Moreover, we evince that this correlation is higher in codon usage clusters involved in less compact protein secondary structure regions. Our results provide important tools for protein design and support the notion that codon usage may dictate translation rate and modulate co-translational folding.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Quantitative analysis of the total expression and solubility of the selected targets. Total expression levels and percentages of solubility were estimated using densitometric analysis of the induced bands present in the pellet and supernatant of SDS-PAGE. Bars plot of total expression and solubility of the selected targets induced at 37 °C (red bars) or 20 °C (blue bars) are shown.
Figure 2
Figure 2
Analysis of primary determinants of gene expression. The total CAI, GC content and mRNA folding energy are plotted as a function of the experimental total expression (blue circles) and solubility (red circles). The linear regression (dashed line), the Pearson’s correlation coefficient and the p-value (two tailed) are shown.
Figure 3
Figure 3
%MinMax profiles of protein targets. The %MinMaxBA (red circles) and the %MinMaxEC (green circles) for six representative ORFs are plotted and superimposed as a function of the codon cluster. %MinMaxBA and %MinMaxEC were calculated using B. argentinensis and E. coli codon usage frequency, respectively.
Figure 4
Figure 4
Analysis of %MinMax-derived parameters and their relationship with the solubility, total expression and predicted secondary structures elements of recombinant proteins. (a) The %MinMax Correlation and the Δ%MinMax calculated for each protein are plotted as a function of the experimental solubility (red circles) and total expression levels (blue circles). (b) The secondary structure content of all selected proteins was predicted using the JPred. The %MinMax Correlation calculated for α-helices, β-sheets and coils are plotted as function of the experimental solubility. In A and B the linear regression (dashed line), the Pearson’s correlation coefficient and the p-value (two tailed) are shown.

References

    1. Correa A, Oppezzo P. Overcoming the solubility problem in E. coli: available approaches for recombinant protein production. Methods Mol Biol. 2015;1258:27–44. doi: 10.1007/978-1-4939-2205-5_2. - DOI - PubMed
    1. Marschall L, Sagmeister P, Herwig C. Tunable recombinant protein expression in E. coli: promoter systems and genetic constraints. Appl Microbiol Biotechnol. 2017;101:501–512. doi: 10.1007/s00253-016-8045-z. - DOI - PMC - PubMed
    1. Lesley SA. High-throughput proteomics: protein expression and purification in the postgenomic world. Protein Expr Purif. 2001;22:159–164. doi: 10.1006/prep.2001.1465. - DOI - PubMed
    1. Fang Y, Fang J. Discrimination of soluble and aggregation-prone proteins based on sequence information. Mol Biosyst. 2013;9:806–811. doi: 10.1039/c3mb70033j. - DOI - PMC - PubMed
    1. Habibi N, Mohd Hashim SZ, Norouzi A, Samian MR. A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli. BMC Bioinformatics. 2014;15:134. doi: 10.1186/1471-2105-15-134. - DOI - PMC - PubMed

Publication types

MeSH terms