Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 10;15(1):3088.
doi: 10.1038/s41467-024-47410-5.

Genetically encoded transcriptional plasticity underlies stress adaptation in Mycobacterium tuberculosis

Affiliations

Genetically encoded transcriptional plasticity underlies stress adaptation in Mycobacterium tuberculosis

Cheng Bei et al. Nat Commun. .

Abstract

Transcriptional regulation is a critical adaptive mechanism that allows bacteria to respond to changing environments, yet the concept of transcriptional plasticity (TP) - the variability of gene expression in response to environmental changes - remains largely unexplored. In this study, we investigate the genome-wide TP profiles of Mycobacterium tuberculosis (Mtb) genes by analyzing 894 RNA sequencing samples derived from 73 different environmental conditions. Our data reveal that Mtb genes exhibit significant TP variation that correlates with gene function and gene essentiality. We also find that critical genetic features, such as gene length, GC content, and operon size independently impose constraints on TP, beyond trans-regulation. By extending our analysis to include two other Mycobacterium species -- M. smegmatis and M. abscessus -- we demonstrate a striking conservation of the TP landscape. This study provides a comprehensive understanding of the TP exhibited by mycobacteria genes, shedding light on this significant, yet understudied, genetic feature encoded in bacterial genomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Genome-wide estimation of Mtb transcriptional plasticity (TP).
a A schematic diagram of TP. b A diagram illustrating the composition of the 894 samples from 73 different conditions. Detailed information about the samples can be found in Supplementary Data 1. c Visualization of the 894 samples using t-distributed stochastic neighbor embedding (tSNE) grouped according to different experimental condition categories. d Primary expression statistics of Mtb genes across the 894 samples. The X-axis represents the ranking of 3891 Mtb genes ordered by their expression ranges (MinMax). The five line-plots represent the maximum (Max), 75 percentile (Q75), median, 25 percentile (Q25), and minimum (Min) expression levels which are centered by subtracting the median expression level of each gene. Expression statistics for three representative genes, hspX, rpoB, and lpqM, are highlighted. e Comparing adj-SD, IQR, and MinMax metrics in describing TP of Mtb genes using a subsampling and bootstrap analysis (see the “Methods” section). A subset of N = 10, 20, 30, 50, 100, 200, 300, 500, or 800 samples were randomly drawn from the full dataset. Statistical significance between correlation coefficients of adj-SD and IQR was estimated by Wilcoxon tests (two-sided), the corresponding p values were 0.096 (N = 10), 0.068 (N = 20), 0.001 (N = 30), 0.002 (N = 50), 0.010 (N = 100), 3.506 × 10−5 (N = 200), 2.239 × 10−6 (N = 300), 3.773 × 10−6 (N = 500) and 4.109 × 10−6 (N = 800). ns represents nonsignificant, *p value 0.01–0.05, **p value 0.001–0.01, ***p value 0.0001–0.001, and ****p value < 0.0001. Error bars represent the mean ± SD of TPs. f Genome-wide TP profiles (adj-SD) of the 3891 Mtb genes. The positively skewed genome-wide TP distribution is illustrated in the right panel.
Fig. 2
Fig. 2. TP is associated with gene function and gene essentiality.
a Functional enrichment analysis of the 195 high-TP genes. Numbers in the dots represent the number of genes in each category. b Violin plots showing the TP profiles of genes in different functional categories, where “Insertion seqs and phages” has 73 genes, “Virulence, detoxification and adaptation” has 236 genes, “PE/PPE” has 160 genes, “Conserved hypotheticals” has 1007 genes, “Regulatory proteins” has 197 genes, “Lipid metabolism” has 268 genes, “Information pathways” has 238 genes, “Cell wall and cell processes” has 762 genes, and “Intermediary metabolism and respiration” has 918 genes. Error bars denote mean ± SD of TPs. The X-axis is presented on a log scale. c 1049 genes of the mycobacterial core-genome exhibit lower TPs than the other 2842 genes of the variable genome. Error bars represent mean ± SD of TPs. Statistical significance was assessed by the Wilcoxon test (two-sided). d TP comparison between 459 essential genes, 2874 non-essential genes, and 301 genes whose disruption confers growth advantage under axenic culture conditions. Statistical significance was assessed by the Wilcoxon test (two-sided); error bars represent mean ± SD of TPs. e Mtb Genes vulnerable to transcriptional perturbation exhibit low TPs. The horizontal black dashed line represents the maximum TP value of essential genes, and the vertical line shows the 5th percentile of vulnerability index of non-essential genes. The counts of essential and non-essential genes in each quadrant are displayed in green and yellow, respectively. f TP positively correlates with genes’ substitution rate, as simulated by genomegaMap (Wilson, 2020). Mean value and 95% credibility intervals of substitution rates are presented in colored points. Colored Lines depict the linear fit between TP and substitution rate. R and p represent Spearman’s correlation coefficient and the associated p values, respectively.
Fig. 3
Fig. 3. Identification of genetic features underlying TP.
a A table summary of the 119 candidate genetic features. N denotes the number of features in each category. b Schematic diagram illustrating our machine-learning workflow. c The top 15 genetic features ranked by their median feature importance in predicting TP. Lower ranks signify higher feature importance for TP prediction, whereas a tight rank distribution indicates higher consistency in predictions across randomized sample splits and modeling iterations. The four genetic features that consistently rank low across random repeats are highlighted in green. Boxes show the median, the 1st and 3rd quartile of feature importance ranks (N = 100) across experiments, and the whiskers represent the median ± 1.5 × IQR (interquartile range). Vertical lines in boxes represent the medians. d An SVM model constructed using only the top four features effectively predicts TP. The green line represents the linear fit between SVM-modeled and observed TPs. The black dashed line represents the formula y = x. Error band represents the 95% confidence interval. Pearson’s correlation coefficients and the corresponding p values are presented.
Fig. 4
Fig. 4. Impact of key genetic features on TP.
a A negative correlation exists between gene length and TP, illustrated by the 2D density contour plot of genes by TP and gene length. The red line depicts the linear fit. Error band represents the 95% confidence interval. R and p represent Spearman’s correlation coefficient and the associated p values, respectively. b Deviation in GC% from the genome-wide GC% (65.6%, black dashed line) is positively linked with TP, depicted by the LOESS trendline and the 2D density contours. Error band represents the 95% confidence interval. c Positive association between average TP bin (divided into 50 TP bins) and standard deviation (SD) of GC% in each bin. Error band represents the 95% confidence interval. R and p represent Spearman’s correlation coefficient and the associated p values, respectively. d 1567 genes in polygenic operons exhibit significantly higher TPs than 2235 genes in monogenic operons. Wilcoxon tests (two-sided). Error bars represent mean ± SD of TPs. e Pearson’s correlation coefficients of TP between genes in different operonic positions, i.e., the first, the second, the third, and the fourth gene of an operon. f and g TP increases as genes are regulated by more regulators. Boxplots demonstrate a monotonic relationship between TP and the number of activators. Genes targeted by only one repressor display the lowest TPs. Numbers of genes targeted by 0, 1, 2, 3, and ≥4 activators are 743, 791, 285, 117, and 104, and the numbers of genes targeted by 0, 1, 2, 3, and ≥4 repressors are 637, 916, 288, 116, and 83. Error bars represent the mean ± SD of TPs. Statistical significance was assessed by Wilcoxon tests (two-sided). h A schematic illustrating the relationships between the four genetic features and TP.
Fig. 5
Fig. 5. The impact of primary sequence features on TP is partially independent of transcription regulation.
a Mtb regulons display varying degrees of transcriptional plasticity. Boxes show the median, the 1st and 3rd quartile of TP, and the whiskers represent the median ± 1.5 × IQR. Vertical lines in boxes represent the median. The red dashed line represents the median TP of all 3891 genes. The bubble plot to the right summarizes the statistical significance (adjusted p-value) and normalized enrichment score (NES) of each regulon by single-sample gene set enrichment analysis (ssGSEA). A higher NES indicates that the operon is enriched for genes with higher TPs. Bubble size corresponds to the number of genes in each regulon. Numbers in the dots represent the number of genes in each regulon. One-sided adjusted p-value was calculated for each regulon. b Expression profiles of DosR regulon genes ranked by TP. The color gradient represents the Z-score normalized log-RPKM. c Variations in TP within the DosR regulon, exemplified by comparing expression profiles of two high-TP genes (hspX and Rv1738) with two low-TP genes (dosT and pncB2). d Deviation in GC% from the genome average partially explains TP variations of genes of the same regulon. Linear fits, Spearman’s correlation coefficients, and the corresponding p values are shown for three representative regulons, WhiB4, Rv1828/SigH, and DosR. Error bands represent the 95% confidence interval. e TPs of co-regulated genes negatively correlate with their gene lengths. Spearman’s correction coefficient and the corresponding p values are provided. The associations between primary genetic features and TP for genes in additional regulons are illustrated in Figs. S6, 7. Error bands represent the 95% confidence interval.
Fig. 6
Fig. 6. TP and its underlying genetic determinants are conserved in other Mycobacterium species.
a and b The TP profiles of M. smegmatis (Msm) and M. abscessus (Mab) genes resemble those of the Mtb homologs. The 2D density contour plots illustrate the distribution of gene orthologs according to their TPs in corresponding Mycobacterium species. Lines denote the linear fits. Error bands represent the 95% confidence interval. Pearson’s correction coefficient and the corresponding p values are provided. c 5875 non-essential Msm genes have higher TPs than 387 essential Msm genes. Error bars represent mean ± SD of TPs. Statistical significance was measured by Wilcoxon tests (two-sided), and the corresponding p-value was presented. d Msm genes vulnerable to transcriptional perturbation exhibit low TPs. The gray circle highlights the lack of genes with both high TP and high vulnerability. e Gene length is negatively associated with TP in Msm (orange) and Mab (blue). The 2D density contour plots illustrate the distribution of genes based on TP and gene length. Error bands represent the 95% confidence interval. f A linear correlation is observed between TPs and gene lengths for genes shorter than 600 bp. Error bands represent the 95% confidence interval. Spearman’s correction coefficient and the corresponding p values are provided (p = 2.53e−17 for Msm and 2.85e−14 for Mab). g and h Genes with GC% close to the genome-wide GC% (67.4% in Msm and 64.1% in Mab, annotated by black dashed lines) display lower TP in both Msm (g) and Mab (h). The 2D density contour plots depict the distribution of genes by their TPs and GC%. Error bands represent the 95% confidence interval.

Update of

References

    1. Silander OK, et al. A genome-wide analysis of promoter-mediated phenotypic noise in Escherichia coli. PLoS Genet. 2012;8:e1002443. doi: 10.1371/journal.pgen.1002443. - DOI - PMC - PubMed
    1. Vlková M, Silander OK. Gene regulation in Escherichia coli is commonly selected for both high plasticity and low noise. Nat. Ecol. Evol. 2022;6:1165–1179. doi: 10.1038/s41559-022-01783-2. - DOI - PubMed
    1. Kenkel CD, Matz MV. Gene expression plasticity as a mechanism of coral adaptation to a variable environment. Nat. Ecol. Evol. 2016;1:14. doi: 10.1038/s41559-016-0014. - DOI - PubMed
    1. Urchueguía A, et al. Genome-wide gene expression noise in Escherichia coli is condition-dependent and determined by propagation of noise through the regulatory network. PLoS Biol. 2021;19:e3001491. doi: 10.1371/journal.pbio.3001491. - DOI - PMC - PubMed
    1. Lehner B. Conflict between noise and plasticity in yeast. PLoS Genet. 2010;6:e1001185. doi: 10.1371/journal.pgen.1001185. - DOI - PMC - PubMed

MeSH terms

Substances