Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 May 1:2025.05.01.651750.
doi: 10.1101/2025.05.01.651750.

Evolution of Mycobacterium tuberculosis transcription regulation is associated with increased transmission and drug resistance

Affiliations

Evolution of Mycobacterium tuberculosis transcription regulation is associated with increased transmission and drug resistance

Peter H Culviner et al. bioRxiv. .

Update in

Abstract

Mycobacterium tuberculosis (Mtb) has co-evolved with humans for thousands of years causing variation in virulence, transmissibility, and disease phenotypes. To identify bacterial contributors to phenotypic diversity, we developed new RNA-seq and phylogenomic tools to capture hundreds of Mtb isolate transcriptomes, link transcriptional and genetic variation, and find associations between variants and epidemiologic traits. Across 274 Mtb clinical isolates, we uncovered unexpected diversity in expression of virulence genes which we linked to known and previously unrecognized regulators. Surprisingly, we found that many isolates harbor variants associated with decreased expression of EsxA (Esat6) and EsxB (Cfp10), which are virulence effectors, dominant T cell antigens, and immunodiagnostic targets. Across >55,000 isolates, these variants associate with increased transmissibility, especially in drug resistant Mtb strains. Our data suggest expression of key Mtb virulence genes is evolving across isolates in part to optimize fitness under drug pressure, with sobering implications for immunodiagnostics and next-generation vaccines.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests Sarah Fortune receives compensation as a non-executive director of Oxford Nanopore Technologies.

Figures

Figure 1.
Figure 1.. Multiplex Transcriptomics of Mtb Clinical Isolates.
(A) Schematic workflow of (left) variant calling and ancestral reconstruction pipeline and (right) high throughput transcriptomic pipeline. Phylogenetic trees are shown with 4 major Mtb lineages colored. (B) Enrichment analysis of genes under diversifying selection and Mtb gene categories. Dots sizes reflect the −log10 p-value of 2-sided Fisher exact test with statistical significance shown by color. (C) Violin plots of percentage of total fragments falling in rRNA and coding regions from the libraries presented in this study. Mean and quartiles shown as dotted lines. (D) Principal component analysis of 27 validation libraries using different barcodes and 3 different concentrations of RNA from lineage 1, 2, and 4 samples. The log2 expression of the top 25% of well-expressed genes sorted by standard deviation were used. (E) Heatmap of Pearson correlations between mean log2 expression of all well-expressed genes for all validation libraries. See also Figure S1.
Figure 2.
Figure 2.. The Mtb transcriptome varies at ancient and modern timescales.
(A) Histogram of the ratio of the mean between isolate standard deviation to the mean between strain standard deviation across all strains. We define high diversity genes as those that meet our expression threshold, are in the top 50% in mean between isolate standard deviation, and top 5% in isolate to replicate standard deviation. (B) Phylogenetic tree of Mtb isolates with transcriptional data with a heatmap of strain PC 2 value from 2C plotted against lineage 2 strains, source cohort, and a heatmap of mean strain gene expression for high diversity strains relative to the gene’s mean across all strains. (C) Scatterplots of principal component analysis separating strain (colored by lineage) transcriptional data using 151 high diversity genes as input. Component 1 is plotted against components 2 and 3 to highlight separation of lineage 2 (PC2) and separation of the lineages (PC3 x PC1). (D) Histograms of all strain pair phylogenetic patristic distances and phenotypic patristic distances calculated from Ward clustering high diversity gene expression. Clustering was conducted on the full tree as well as separately on each of the three major lineages. The R of the underlying scatters are shown. See also Figure S2.
Figure 3.
Figure 3.. Mtb-host interactive processes have diverse expression.
(A) Enrichment analysis of genes in the high expression diversity set vs functionally and experimentally defined gene categories from prior literature. Plotted as in 1B. (B) Volcano plot showing significant associations between transcripts enriched for expression outliers and the variable loci predictive of their expression. Four variant loci are color-coded, and their associated transcripts are labelled. Datapoints were sized according to the maximum pN/pS of the variant gene across L1-4 in Table S1. See also Figure S3.
Figure 4.
Figure 4.. SlfR is a regulator of sulfolipid-1 biosynthesis gene expression.
(A) Variants in slfR (Rv0042c) are plotted against a heatmap of expression z-scores of associated genes from GWAS. Red lines on the tree denote the inheritance of variants. Predicted deleterious, frameshift, and pre-stop variants are indicated. Strain gene expression is shown by heatmap relative to the gene’s mean across all strains; blocks of genes are co-operonic. (B) Schematics summarizing purposed SlfR regulation of pks2-papA1-mmpL8 and showing role of the operon in biosynthesis of sulfolipid-1 (SL-1). (C) Bar plot showing mean enrichment from two replicates of Mtb gDNA incubated with or without purified SlfR. Genes nearby each peak are labelled. (D) Volcano plot showing z-test results for the mean of 2 genome-wide proteomics replicates for 3 pairs of closely related strains differing by slfR variants. Only genes passing an FDR of 0.05 are plotted and 4 genes with corresponding IDAP-seq peaks are labelled. (E) Bar plot summarizing change in protein and transcript abundance between three pairs of closely related isolates differing by slfR variants. Proteomics and transcriptomic datapoints shown are the average of replicates. See also Figure S4.
Figure 5.
Figure 5.. Frequent whiB6 variants associate with decreased key ESX-1 virulence effectors.
(A) whiB6 variants plotted against a heatmap of expression z-scores of the PE35-PPE68-esxBA operon. Red lines on the tree denote the inheritance of variants. Plotted as in 4A. (B) Swarm plot showing replicate mean expression of PE35-PPE68-esxBA operon relative to the mean across all strains colored by lineage. Strains are categorized by whiB6 genotype: emerging missense (predicted deleterious coding variants in lineages 2 and 4), ancient missense (1 of the 2 widely inherited predicted deleterious coding variants spanning lineage 1), emerging upstream (variants within 2 bases of the upstream peak in S5A), or ancestral (none of the prior variants). (C) Schematic summarizing purposed effect of whiB6 variants on expression of ESX-1 effectors. (D) Western blots for EsxB (secreted effector), Mpt64 (secreted control), and GroEL (cytosolic control) across 3 closely related clinical isolate pairs differing by whiB6 coding variants. H37Rv and H37Rv ΔespA are shown as controls to show loss of EsxB secretion without the EspA chaperone. Protein extracted from cell pellet and supernatant of Mtb grown in vitro are shown. (E) Quantification of EsxB and Mpt64 bands from 5D and S5B. Student’s t-test p-values=*0.016 and **<1e-4. (F) At the top, pie charts showing the strain frequency of predicted functional whiB6 variant strains across lineages 2 and 4. At the bottom, cumulative distributions showing the cumulative fraction of predicted functional whiB6 variant strains driven by each independent mutational event. (G) Pie chart and cumulative distribution of predicted functional whiB6 variant strains plotted as in 5F but for lineage 1. See also Figure S5.
Figure 6.
Figure 6.. ESX-1 regulatory variation is widespread at the espACD locus in Mtb lineage 1.
(A) Heatmap of espACD expression z-score in lineage 1 relative to the mean across all strains as in 4A. Variant row shows the presence or absence of deletions shown in 6C. (B) Schematic summarizing the purposed effect of espACD upstream variants on espACD expression. (C) Bar plot counting short (≤5 nt) mutational events across global isolates upstream of the espACD operon. Two upstream variant peaks are highlighted and as well as variation at a homopolymeric site. (D) Histogram showing the distribution of SNP events in all 11 base windows in intergenic regions with high quality variant calls. Position of espACD upstream peaks on the distribution are shown. (E) Heatmap of espACD expression for a clade of lineage 2 strains highlighting a single strain with a unique variant near Site B. (F) Pie chart and cumulative distribution showing inheritance of espACD upstream variants in lineage 1 plotted as in 5F. Slices and points are colored by variant shown in 6C. See also Figure S6.
Figure 7.
Figure 7.. Modern whiB6 variants associate with drug resistance and transmission.
(A) Statistical association of predicted deleterious whiB6 and slfR variants and espACD upstream variants with modernity, measured by distance from root and compared to a null distribution of randomly selected branches. Color denotes the directionality and significance (estimated p-value < 0.05) of associations. Size indicates significance of association. (B) Statistical association of variants from 7A with selected high level drug resistance across global isolates separated by lineage. Dot size shows estimated p-value and color denotes significance with Benjamini-Hochberg FDR α=0.05. (C) Comparison of mean terminal branch length, a metric of transmission, of clades with variants from 7A to a null distribution calculated by random permutations. Associations were run on the complete tree or with trees with strains with genotypic drug resistance or drug sensitivity. Color denotes the directionality and significance (estimated p-value < 0.05) of associations. Size indicates significance of association.

References

    1. Boritsch E.C., Khanna V., Pawlik A., Honoré N., Navas V.H., Ma L., Bouchier C., Seemann T., Supply P., Stinear T.P., et al. (2016). Key experimental evidence of chromosomal DNA transfer among selected tuberculosis-causing mycobacteria. Proc. Natl. Acad. Sci. 113, 9876–9881. 10.1073/pnas.1604921113. - DOI - PMC - PubMed
    1. Didelot X., Bowden R., Wilson D.J., Peto T.E.A., and Crook D.W. (2012). Transforming clinical microbiology with bacterial genome sequencing. Nat. Rev. Genet. 13, 601–612. 10.1038/nrg3226. - DOI - PMC - PubMed
    1. Bastos H.N., Osório N.S., Gagneux S., Comas I., and Saraiva M. (2018). The Troika Host–Pathogen–Extrinsic Factors in Tuberculosis: Modulating Inflammation and Clinical Outcomes. Front Immunol 8, 1948. 10.3389/fimmu.2017.01948. - DOI - PMC - PubMed
    1. Goig G.A., Windels E.M., Loiseau C., Stritt C., Biru L., Borrell S., Brites D., and Gagneux S. (2025). Ecology, global diversity and evolutionary mechanisms in the Mycobacterium tuberculosis complex. Nat. Rev. Microbiol., 1–13. 10.1038/s41579-025-01159-w. - DOI - PubMed
    1. Liu Q., Zhu J., Dulberger C.L., Stanley S., Wilson S., Chung E.S., Wang X., Culviner P., Liu Y.J., Hicks N.D., et al. (2022). Tuberculosis treatment failure associated with evolution of antibiotic resilience. Science 378, 1111–1118. 10.1126/science.abq2787. - DOI - PMC - PubMed

Publication types

LinkOut - more resources