Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 22;13(5):e1005549.
doi: 10.1371/journal.pcbi.1005549. eCollection 2017 May.

Protein charge distribution in proteomes and its impact on translation

Affiliations

Protein charge distribution in proteomes and its impact on translation

Rodrigo D Requião et al. PLoS Comput Biol. .

Abstract

As proteins are synthesized, the nascent polypeptide must pass through a negatively charged exit tunnel. During this stage, positively charged stretches can interact with the ribosome walls and slow the translation. Therefore, charged polypeptides may be important factors that affect protein expression. To determine the frequency and distribution of positively and negatively charged stretches in different proteomes, the net charge was calculated for every 30 consecutive amino acid residues, which corresponds to the length of the ribosome exit tunnel. The following annotated and reviewed proteins in the UniProt database (Swiss-Prot) were analyzed: 551,705 proteins from different organisms and a total of 180 million protein segments. We observed that there were more negative than positive stretches and that super-charged positive sequences (i.e., net charges ≥ 14) were underrepresented in the proteomes. Overall, the proteins were more positively charged at their N-termini and C-termini, and this feature was present in most organisms and subcellular localizations. To investigate whether the N-terminal charges affect the elongation rates, previously published ribosomal profiling data obtained from S. cerevisiae, without translation-interfering drugs, were analyzed. We observed a nonlinear effect of the charge on the ribosome occupancy in which values ≥ +5 and ≤ -6 showed increased and reduced ribosome densities, respectively. These groups also showed different distributions across 80S monosomes and polysomes. Basic polypeptides are more common within short proteins that are translated by monosomes, whereas negative stretches are more abundant in polysome-translated proteins. These findings suggest that the nascent peptide charge impacts translation and can be one of the factors that regulate translation efficiency and protein expression.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Supercharged positive sequences are underrepresented in most proteomes.
A. Net-charge frequency histogram of the amino-acid segments in all 551,705 proteins from the SwissProt database with the histidine charge considered 0 (red line) and +1 (blue line). When histidine was considered 0, it is possible to observe that 47% of the sequences were negatively charged, 15% of the sequences were neutral and 37% of the sequences were positively charged. When histidine was considered +1, 37% of the sequences were negatively charged, 14% of the sequences were neutral and 47% of the sequences were positively charged (upper-right inset). Super charged positive sequences (+14 or more) were less frequent than super charged negative sequences, even when histidine was considered +1 (upper-left inset). B. Net charge ratio between the negative and positive sequences shows a steep increase around charge 14. Analysis of the S. cerevisiae proteome (6,721 proteins, green squares) showed a pattern that was similar to that in all 551,705 proteins (blue circles). Analyses performed with histidine considered +1 showed similar results (empty symbols).
Fig 2
Fig 2. Positive charged sequence concentrations at the N- and C-termini.
A. Heat map of the H. sapiens (20,177 proteins), D. melanogaster (3,341 proteins), C. elegans (3,743 proteins), A. thaliana (14,791 proteins) and S. cerevisiae (6,721 proteins) proteome net charge distribution. Red tiles represent the positively charged sequences, black tiles represent the neutral sequences and green tiles represent the negatively charged sequences. The number +/- 0.74 is the mean net charge of the organisms. Our analyses reveal a strong concentration of net positive charges at the N-termini (upper panel, net charges from amino acids 1 to 30 until 100 to 129), a concentration of net negative charges at their core (middle panel, net charges from amino acids 130 to 159 until 230 to 259) and a smaller, but still present, concentration of positive net charges at the C-termini (bottom panel, net charges from amino acids -129 to -100 until -30 to -1). B. Frequency of the net charges in all 551,705 proteins separated by 30 amino acid segments of the primary sequence. It is possible to observe that the N-terminal regions show 49% net positively charged segments, 15% neutral segments and 36% net negatively charged segments. The core regions display 37% net positively charged segments, 16% neutral segments and 47% net negatively charged segments. The c-terminal regions display 46% net positively charged segments, 14% neutral segments and 40% net negatively charged segments. C. The net charge of the first 30 residues (abscissa) of each of the 6,721 proteins from S. cerevisiae is plotted against the net charge of the last 30 residues of the same protein (ordinate). There is a small but positive correlation between the N-terminal and C-terminal charges (P < 0.0001, R2 = 0.006479) in S. cerevisiae.
Fig 3
Fig 3. Average charge of each residue in the first and last 30 amino acids of all recorded proteins from each species.
A. H. sapiens, B. D. melanogaster, C. C. elegans, D. A. thaliana, E. S. cerevisiae proteomes and F. these five proteomes combined. Blue circles represent whole proteomes (H. sapiens 20,177, D. melagnogaster 3,341, C. elegans 3,743, A. thaliana 14,791 and S. cerevisiae 6,721 proteins), whereas red circles represent proteomes minus proteins that possess signal peptides (H. sapiens 16,717 proteins, D. melanogaster 2,883 proteins, C. elegans 3,232 proteins, A. thaliana 12,599 proteins, and S. cerevisiae 6,382 proteins). Our N-terminal analyses show conservation of a neutral residue at position number 1, a negatively charged residue at position number 2 and a concentration of positively charged residues from approximately position number 3 to 10. Our C-terminal analyses show conservation of positively charged residues from approximately amino acids -15 to -10 onwards.
Fig 4
Fig 4. N-terminal positive charge concentration of proteins from different subcellular locations.
Heat map for H. sapiens, D. melanogaster, C. elegans, A. thaliana and S. cerevisiae proteomes showing the average net charge distribution (net charges from amino acids 1 to 30 until 100 to 129) divided into 17 different subcellular locations. Red tiles represent segments with a net positive charge, black tiles represent neutral segments and green tiles represent segments with a net negative charge. N-terminal positive charge concentration is conserved among most subcellular locations, except for endosomes, lysosomes, vacuoles and nucleoplasm proteins.
Fig 5
Fig 5. N terminal net charge distribution in cytosolic and membrane proteins from different organelles.
A. Proteins from H. sapiens, D. melanogaster, C. elegans, A. thaliana and S. cerevisiae localized in different organelles (mitochondria, endoplasmic reticulum, Golgi apparatus and vacuoles) were compared with proteins from the cytoplasm. The net charge of consecutive windows of 30 amino acids were calculated up to amino acid number 100. B. Same as that shown in A, but the mitochondrial proteins were removed from the “membrane” dataset.
Fig 6
Fig 6. Ribosome profiling analyses of S. cerevisiae proteins grouped according to the net charge of the first 30 residues.
A. Using the net charge of the first 30 residues, the S. cerevisiae proteome was subdivided into the following eight categories: proteins with a net charge ≥+8 (144 proteins), +7 (134 proteins), +6 (208 proteins), +5 (348 proteins), 0 (898 proteins), -1 (810 proteins), -6 (90 proteins) and ≤-7 (117 proteins). Mitochondrial proteins (414 proteins) were also included in this analysis. Ribosome profiling analysis was performed using data generated by the Brown laboratory[34]. Normalization was performed by dividing the number of reads at each position by the total number of reads in each gene. B. The area under the curve from codon 0 to codon 90 (corresponding to the first 30 amino acids) from panel A was calculated for each group of proteins and divided by the area under the curve for a net charge of 0. It is possible to observe a slight increase in the number of reads with positive net charges; however, for net charges greater than +8, it is possible to observe almost twice as many reads compared to those with a net charge of 0. C. The relative frequency of the NLS score was calculated by NucPred[35] for the first 30 amino acids of the proteins described in Panel A. A score threshold of 0.8 indicates a sensitivity of 0.30 and a specificity of 0.61. D. N-terminal net charge distribution in proteins from all sub-cellular localizations of S. cerevisiae compared with proteins with net charges ≥+8 (144) in the first 30 residues.
Fig 7
Fig 7. Positively charged N-termini are associated with monosomal translation.
A. Net charge frequency histogram of differentially translated proteins in S. cerevisiae. Proteins were grouped based on their Monosome:Polysome scores[7] into “ORFs < 590” (1027 proteins, orange line), “Monosome enriched” (204 proteins, green line), “No enrichment” (1908 proteins, blue line), “Polysome enriched” (1006 proteins, black line) and “Top 300 polysome” (300 proteins, red line). There is an increasing concentration of positive sequences from “Top 300 polysome” to “ORFs < 590” (inset). B. Frequency of net charges in different groups. “ORFs < 590” contains 53% positively charged sequences, 12% neutral sequences and 35% negatively charged sequences; “Monosome enriched” contains 44% positively charged sequences, 15% neutral sequences and 41% negatively charged sequences; “No enrichment” contains 41% positively charged sequences, 14% neutral sequences and 45% negatively charged sequences; “Polysome enriched” contains 37% positively charged sequences, 14% neutral sequences and 49% negatively charged sequences and “Top 300 polysome” contains 36% positively charged sequences, 14% neutral sequences and 50% negatively charged sequences. C. Average N-terminal (1st), core (130th) and C-terminal (last) net charges in each of the groups shown in panel B. There is an N-terminal positive charge enrichment in the groups “ORFs < 590”, “Monosome enriched” and “No enrichment” (from higher to lower); there are neutral sequences in the group “Polysome enriched” and a negative charge enrichment in the group “Top 300 polysome”. This pattern (increasing concentration of positive sequences from “Top 300 polysome” to “ORFs < 590”) is maintained in the core with a positive charge enrichment in “ORFs < 590”, neutral sequences in “Monosome enriched” and a negative charge enrichment in “No enrichment”, “Polysome enriched” and “Top 300 polysome” (from lower to higher). C-terminal analyses do not show the same pattern, even though there is a positive charge concentration in “ORFs < 590”, a neutral sequence preference in “Monosome enriched” and “No enrichment” and a negative charge enrichment in “Polysome enriched” and “Top 300 polysome”. D. Percentage of proteins in different Monosome:Polysome groups separated by their N-termini net charges. S. cerevisiae’s proteins were separated by their N-termini net charges into 16 groups, ranging from -7 to +8 (the first dot, from left to right, represents proteins with N-termini net charges of -7 and below, while the last dot represents proteins with net charges of +8 and higher), and then, the proteins were separated into the Monosome:Polysome groups. Their Monosome:Polysome group distribution percentage was calculated and plotted. Dotted lines represent the percentage of all S. cerevisiae’s proteins in the Monosome:Polysome groups.

References

    1. Hershey JWB, Sonenberg N, Mathews MB. Principles of Translational Control: An Overview. Cold Spring Harb Perspect Biol. 2012;4. - PMC - PubMed
    1. Komar AA. A pause for thought along the co-translational folding pathway. Trends Biochem Sci. 2009;34:16–24. 10.1016/j.tibs.2008.10.002 - DOI - PubMed
    1. Yu C-H, Dang Y, Zhou Z, Wu C, Zhao F, Sachs MS, et al. Codon Usage Influences the Local Rate of Translation Elongation to Regulated Co-translational Protein Folding Mol Cell. 2015;59:744–54. 10.1016/j.molcel.2015.07.018 - DOI - PMC - PubMed
    1. Lu J, Deutsch C. Electrostatics in the Ribosomal Tunnel Modulate Chain Elongation Rates. J Mol Biol. 2008;384:73–86. 10.1016/j.jmb.2008.08.089 - DOI - PMC - PubMed
    1. Thanaraj TA, Argos P. Ribosome-mediated translational pause and protein domain organization. Protein Sci. 1996;5:1594–612. 10.1002/pro.5560050814 - DOI - PMC - PubMed

MeSH terms