Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 19;173(3):749-761.e38.
doi: 10.1016/j.cell.2018.03.007. Epub 2018 Mar 29.

Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry

Affiliations

Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry

Jean-Benoît Lalanne et al. Cell. .

Abstract

Coexpression of proteins in response to pathway-inducing signals is the founding paradigm of gene regulation. However, it remains unexplored whether the relative abundance of co-regulated proteins requires precise tuning. Here, we present large-scale analyses of protein stoichiometry and corresponding regulatory strategies for 21 pathways and 67-224 operons in divergent bacteria separated by 0.6-2 billion years. Using end-enriched RNA-sequencing (Rend-seq) with single-nucleotide resolution, we found that many bacterial gene clusters encoding conserved pathways have undergone massive divergence in transcript abundance and architectures via remodeling of internal promoters and terminators. Remarkably, these evolutionary changes are compensated post-transcriptionally to maintain preferred stoichiometry of protein synthesis rates. Even more strikingly, in eukaryotic budding yeast, functionally analogous proteins that arose independently from bacterial counterparts also evolved to convergent in-pathway expression. The broad requirement for exact protein stoichiometries despite regulatory divergence provides an unexpected principle for building biological pathways both in nature and for synthetic activities.

Keywords: Enzyme expression stoichiometry; Rend-seq; end-enriched RNA-seq; operon evolution; ribosome profiling.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Conservation of protein expression stoichiometry for ancient pathways in bacteria
(A, B) Synthesis rates for proteins involved in translation (A) and DNA maintenance (B) plotted for B. subtilis and E. coli. Each dot represents a pair of either homologous proteins (circle) or paralogous groups (square, aggregated synthesis rates). Synthesis rates are normalized to the median of ribosomal proteins. Translation factors are highlighted in (A). The black line is a linear fit with a slope of 1 through logarithmically transformed synthesis rates. Dashed lines indicate twofold deviation from the fit. Insets in (A) and (B) show cumulative distribution functions (CDF) of fold-deviation from the regression line. (C) Comparison of synthesis rates for curated metabolic pathways. Each dot is an enzyme, color-coded by the pathway schemes shown in insets. Reactions performed by non-homologous enzymes are in grey. See Table S2 for the list of enzymes and statistical testing for significance. See Fig. S1 for detailed pathway schemes. acpP (acyl carrier protein, black circle) is included in the fatty acid category though not formally an enzyme. Chorismate and purine biosynthesis pathways are each split in two due to intervening fluxes (dashed arrows). Asterisk indicates pathways with non-conserved stoichiometry. The intercept of linear fit indicates differential expression of pathways relative to ribosomal proteins. See also Figure S1 and Table S2.
Fig. 2
Fig. 2. Differential protein expression within ancient gene clusters is quantitatively conserved
(A) Examples of conserved gene clusters between E. coli (top) and B. subtilis (bottom). Homologous pairs (gene names from E. coli) are color-coded for plots on the right. Intervening non-conserved genes are not colored. White asterisk (*) highlights dxr whose order within the cluster is shifted. Panels on the right show synthesis rates for proteins in the conserved clusters, plotted for B. subtilis and E. coli. (B) Global analysis of expression stoichiometry for conserved gene clusters between B. subtilis and E. coli. Each cluster is classified by highly conserved (magenta), partially conserved (blue), and divergent (grey) protein expression stoichiometry. Within the group of highly conserved stoichiometry, clusters are further divided by having equal or unequal synthesis rates (STAR Methods). Genes (dots) that belong to the same cluster are connected by lines. (C) Pair-wise comparison of in-cluster stoichiometry across different bacterial species. The number of genes in each category is listed. Statistical significance for the fraction of clusters with conserved stoichiometry is listed (STAR Methods) Ecol: E. coli; Bsub: B. subtilis; Vnat: V. natriegens; Ccre: C. crescentus. (D) Gene copy number variation for EF-Tu and EF-G. Paralogous copies outside the conserved S12 gene cluster are labeled as fusB, fusC (for EF-G) and tufB (for EF-Tu). (E) Total protein synthesis rates for EF-Tu and EF-G in each species. The contribution of each gene locus is indicated by arrows colored according to (D). The nucleotide sequences for tufA and tufB in C. crescentus are 100% identical, and the respective synthesis rates cannot be distinguished. For (A), (B), and (E), protein synthesis rates are normalized as in Fig. 1. See also Figure S4 and Table S2.
Fig. 3
Fig. 3. Rend-seq defines and quantifies mRNA isoforms with single-nucleotide resolution
(A) Schematic of end-enrichment strategy. N molecules of mRNA are randomly cleaved with a small probability per base (p≪1). Fragmented RNA is selected for short sizes, converted to a cDNA library, and deep-sequenced. The 5’-mapped (orange) and 3’-mapped (blue) read counts are then plotted separately, revealing peaks at the ends of transcripts and a largely constant read density across the transcript body (simulated data). Peak shadows are shown here but computationally removed for data visualization (see STAR Methods). (B) Example of Rend-seq data showing increased end-enrichment with decreased fragmentation. The rpmE gene in B. subtilis is shown for Rend-seq libraries generated with different amount of fragmentation time t. Quantitation in the right panel shows that the median end-enrichment across the transcriptome scales as 1/t (dashed line). (C) Example of Rend-seq data showing multiple mRNA isoforms with alternative 5’ ends (B. subtilis). Relative isoform abundance can be estimated both by read density between peaks and by peak height (parenthesis). Zoomed-in views illustrate peak width. See also Figure S2.
Fig. 4
Fig. 4. Rend-seq reveals widespread usage of tuned transcription terminators setting differential expression
(A-C) Examples of gene clusters with intervening partial terminators in B. subtilis. 5’- and 3’-mapped read counts, shown in logarithmic scale, are plotted in orange and blue, respectively. Black lines indicate average read counts between peaks. Rend-seq data have peak shadows removed for clarity, see Methods. See Table S3 for a comprehensive list of intergenic tuned terminators. Terminator sequences and the corresponding leakiness (fraction of read density remaining past terminator) are shown above each internal 3’ peak. Asterisk points to a short intergenic region between a promoter and an upstream tuned terminator, whose leakiness is estimated based on the peak height (Fig. S3, STAR Methods). Arrow points to a nested promoter immediately upstream of the frr terminator. Dagger points to the regulatory region upstream of rho (Ingham et al., 1999). (D) Northern blotting against different regions of the rpsB gene cluster in the wild type (WT) or a strain with perturbed rpsB/tsf terminator (ΔsU). Arrows point to different isoforms as indicated in (A). Grey arrow points to an unknown isoform. Relative abundance predicted by Rend-seq is shown under ‘R,’ and by Northern blotting under ‘N.’ 16S rRNA is used as a loading control. See Fig. S3 for Northern blots for other gene clusters. (E) Distribution of contribution of downstream mRNA level by terminator read-through. All identified terminators contributing to more than 10% of the transcription of downstream genes in B. subtilis are included. 167 tuned terminators contribute to more than 50% (red shading) of downstream gene expression for growth in LB. (F) Cumulative distribution of read-through fraction for terminators with different U-tract lengths, defined as the number of consecutive U’s within 8 nt upstream of the 3’ end. Data shown for B. subtilis deleted with pnpA, which encodes the major 3’-to-5’ exoribonuclease (Oussenko et al., 2005). See Fig. S3 for data for wildtype B. subtilis and other species. Significance (two sample t-test) was computed between the log-transformed read-through fractions of consecutive U-tract lengths. See also Figure S2, S3 and Table S3.
Fig. 5
Fig. 5. Bacterial gene clusters have divergent mRNA architecture but conserved protein stoichiometry
(A) Rend-seq data for the conserved gene cluster ffh-rpsP-rimM-trmD-rplS showing divergent transcript architecture between E. coli (top) and B. subtilis (bottom). Data are displayed as in Fig. 4(A-C). Black arrows point to the tuned terminator in B. subtilis, which is absent in E. coli. (B) Conservation of synthesis rates for the corresponding proteins, with coloring based on (A). Black and dashed lines are as described in Fig. 1. (C) Contribution of mRNA level (from Rend-seq) and translation efficiency (from ribosome profiling and Rend-seq) to conserved expression stoichiometry. Rates of protein synthesis relative to ffh is plotted for E. coli (blue) and B. subtilis (red). The contributions of differential mRNA levels and translation efficiency are shown by grey and white arrows, respectively. (D to F) Same as (A to C), but for the gene cluster containing rbfA. Black arrows indicate the positions of tuned terminator either upstream (E. coli) or downstream (B. subtilis) of rbfA. The asterisk (*) between E. coli’s rpsO and pnp in (D) highlights a known processing site by RNase III (see Mendeley Data). (G) Genome-wide comparison of transcript architecture for the gene clusters with conserved protein expression stoichiometry. Species names abbreviated as in Fig. 2C. See STAR Methods for definition of remodeled clusters. See also Figure S4 and Data S1.
Fig. 6
Fig. 6. Dispersion of gene clusters is compensated to maintain conserved protein stoichiometry
Divergent operon organization for a subset of pathways shown in Fig. 1. For each pathway, gene positions are highlighted by colored circles on the circular chromosome diagram (oriC: origin of replication). Color coding is the same as Fig. 1. Genes in the same operon are represented as concentric circles. For example, folate biosynthetic genes are all clustered in B. subtilis, but are scattered around the chromosome in E. coli. Ecol: E. coli; Bsub: B. subtilis. See also Figure S5.
Fig. 7
Fig. 7. Conservation of pathway-specific protein stoichiometry across the prokaryote/eukaryote divide
(A, B) Synthesis rates (normalized as in Fig. 1) for proteins involved in cytosolic translation (A) and glycolysis (B) are plotted for S. cerevisiae and E. coli. Synthesis rates in S. cerevisiae were estimated based on ribosome profiling data reported by (Weinberg et al., 2016) (STAR Methods). Functional analogs (proteins with similar function but with pairwise BLASTP score<45) are shown as stars. Plotting convention as in Fig. 1. Inset in (A) shows cumulative distribution function (CDF) of fold-deviation from the regression line. Inset in (B) shows the pathway diagram for core glycolysis. (C) Distribution of fold-deviation from pathway-specific regression lines for proteins with different genes dosage. Proteins involved in translation and glycolysis are grouped by the numbers of paralogous genes in E. coli and S. cerevisiae. Medians are indicated by red lines, and whiskers correspond to the 5th and 95th percentiles. The fold-deviation for singleton and duplicated genes in S. cerevisiae is tightly distributed and not statistically different (p = 0.53 two-sample t-test, p = 0.35 two-sample Kolmogorov-Smirnov test). As a comparison, the ratio of synthesis rates for all one-to-one homologs across the two genomes (excluding mitochondrial proteins) is shown with a much wider distribution. See also Figure S6.

References

    1. Agarwala R, Barrett T, Beck J, Benson DA, Bollin C, Bolton E, Bourexis D, Brister JR, Bryant SH, Canese K, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2016;44:D7–D19. - PMC - PubMed
    1. Albert FW, Kruglyak L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 2015;16:197–212. - PubMed
    1. Alon U. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman Hall. 2007;10:301.
    1. Andersen GR, Pedersen L, Valente L, Chatterjee I, Kinzy TG, Kjeldgaard M, Nyborg J. Structural Basis for Nucleotide Exchange and Competition with tRNA in the Yeast Elongation Factor Complex eEF1A:eEF1Ba. Mol. Cell. 2000;6:1261–1266. - PubMed
    1. Artieri CG, Fraser HB. Evolution at two levels of gene expression in yeast. 2014:411–421. - PMC - PubMed

Publication types