Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 6;12(2):e0228923.
doi: 10.1128/spectrum.02289-23. Epub 2024 Jan 17.

Diversification of gene content in the Mycobacterium tuberculosis complex is determined by phylogenetic and ecological signatures

Affiliations

Diversification of gene content in the Mycobacterium tuberculosis complex is determined by phylogenetic and ecological signatures

Taiana Tainá Silva-Pereira et al. Microbiol Spectr. .

Abstract

We analyzed the pan-genome and gene content modulation of the most diverse genome data set of the Mycobacterium tuberculosis complex (MTBC) gathered to date. The closed pan-genome of the MTBC was characterized by reduced accessory and strain-specific genomes, compatible with its clonal nature. However, significantly fewer gene families were shared between MTBC genomes as their phylogenetic distance increased. This effect was only observed in inter-species comparisons, not within-species, which suggests that species-specific ecological characteristics are associated with changes in gene content. Gene loss, resulting from genomic deletions and pseudogenization, was found to drive the variation in gene content. This gene erosion differed among MTBC species and lineages, even within M. tuberculosis, where L2 showed more gene loss than L4. We also show that phylogenetic proximity is not always a good proxy for gene content relatedness in the MTBC, as the gene repertoire of Mycobacterium africanum L6 deviated from its expected phylogenetic niche conservatism. Gene disruptions of virulence factors, represented by pseudogene annotations, are mostly not conserved, being poor predictors of MTBC ecotypes. Each MTBC ecotype carries its own accessory genome, likely influenced by distinct selective pressures such as host and geography. It is important to investigate how gene loss confer new adaptive traits to MTBC strains; the detected heterogeneous gene loss poses a significant challenge in elucidating genetic factors responsible for the diverse phenotypes observed in the MTBC. By detailing specific gene losses, our study serves as a resource for researchers studying the MTBC phenotypes and their immune evasion strategies.IMPORTANCEIn this study, we analyzed the gene content of different ecotypes of the Mycobacterium tuberculosis complex (MTBC), the pathogens of tuberculosis. We found that changes in their gene content are associated with their ecological features, such as host preference. Gene loss was identified as the primary driver of these changes, which can vary even among different strains of the same ecotype. Our study also revealed that the gene content relatedness of these bacteria does not always mirror their evolutionary relationships. In addition, some genes of virulence can be variably lost among strains of the same MTBC ecotype, likely helping them to evade the immune system. Overall, our study highlights the importance of understanding how gene loss can lead to new adaptations in these bacteria and how different selective pressures may influence their genetic makeup.

Keywords: Mycobacterium africanum; Mycobacterium bovis; Mycobacterium tuberculosis; evolution; genomics; pan-genome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
Maximum likelihood phylogenetic tree based on core SNPs of the 233 genomes of the MTBC used in this study. Colored branches correspond to bacterial species and lineages. Mycobacterium canettii CIPT 140010059 was used as an outgroup. Genome representatives of M. tuberculosis L3, L7, and L8 and M. africanum L9 were not detected in the data set, hence not included in the study. A core SNP matrix was generated using kSNP3 (60) and the phylogenetic tree was inferred using IQ-Tree with 1,000 UFBoot pseudoreplicates. Graphical edition was performed using FigTree (63) and Adobe Illustrator software. Bootstrap replicas of main nodes are all ≥90%. Bar shows substitutions per nucleotide.
Fig 2
Fig 2
Pan-genome distributions, SNP distance, and %GRR between genomes of 233 strains of the MTBC. (A) Percentage distribution of the groups of orthologous proteins in core, accessory, and strain-specific genomes. (B) Percentage distribution of the total number of proteins in core, accessory, and strain-specific genome. Pseudogenes are not included in this analysis. (C) Violin plot of the SNP distance between genomes of each bacterial group. (D) Violin plot of the %GRR between genomes of each bacterial group. Groups are Mycobacterium tuberculosis (G1, n = 114), Mycobacterium africanum (G2, n = 33), Mycobacterium bovis (G3, n = 66), and animal strains (G4, n = 20). Animal strains include Mycobacterium caprae, Mycobacterium mungi, Mycobacterium pinnipedii, Mycobacterium orygis, Mycobacterium microti, and “dassie bacillus.” Genome representatives of M. tuberculosis L3, L7, and L8 and M. africanum L9 were not detected in the data set, hence not included in the study. No statistical difference was observed among groups (P-value >0.05) using Mann-Whitney test. Graphs were generated using ggplot2 in R software version 4.1.3 (65).
Fig 3
Fig 3
%GRR as a function of the patristic distance between genomes of the MTBC. Top graph shows the correlation between %GRR and patristic distance across pairs of genomes of the same species colored according to each species. MTBC species with only one representative were excluded. Bottom graph shows the correlation between %GRR and patristic distance across pairs of genomes from different species colored according to inter-species comparisons. Graphs were generated using ggplot2 in R software version 4.1.3 (65). Genome representatives of M. tuberculosis L3, L7, and L8 and M. africanum L9 were not detected in the data set, hence not included in the study.
Fig 4
Fig 4
Average %GRR between genomes of the MTBC. Triangles (intra-species) and circles (inter-species) indicate average %GRR in comparison to each species on the x-axis. Genome representatives of M. tuberculosis L3, L7, and L8 and M. africanum L9 were not detected in the data set, hence not included in the study.
Fig 5
Fig 5
PCA of the groups of orthologous proteins of the MTBC. (A) Biplot of the first two principal components, which explain 74.2% of the variance. Lineages and species are colored. (B) Proportion (%) of explained variance by each principal component. (C) Variables graph of the PCA, zero-centered and scaled to unit variance. PCA was generated using prcomp in R software version 4.1.3. Genome representatives of M. tuberculosis L3, L7, and L8 and M. africanum L9 were not detected in the data set, hence not included in the study.
Fig 6
Fig 6
Heatmap of the presence and absence of groups of orthologous proteins of the MTBC. Heatmap was generated with a customized script in Python using the output of OrthoFinder (64) generated from 233 MTBC genomes. Red, protein cluster is present. Yellow, protein cluster is absent. Core genome is shown only partially to simplify the figure. Genome representatives of M. tuberculosis L3, L7, and L8 and M. africanum L9 were not detected in the data set, hence not included in the study.
Fig 7
Fig 7
Heatmap of presence and absence of VFs in 233 strains of the MTBC. Colored bars represent MTBC lineages and species. Dark beige indicates the VF protein is present in the genome, while light beige indicates the VF protein is absent and its gene has been predicted as pseudogene by the PGAP of NCBI. Brown indicates the protein is absent because of a deletion (previously described RDs). Genome representatives of M. tuberculosis L3, L7, and L8 and M. africanum L9 were not detected in the data set, hence not included in the study.

Similar articles

Cited by

References

    1. WHO. 2018. Global tuberculosis report 2018.
    1. Azami HY, Zinsstag J. 2018. Economics of bovine tuberculosis: a one health issue, p 31–42. In Chambers M, Gordon S, Olea-Popelka F, Barrow P (ed), Bovine tuberculosis.
    1. Reis AC, Ramos B, Pereira AC, Cunha MV. 2021. The hard numbers of tuberculosis epidemiology in wildlife: a meta-regression and systematic review. Transbound Emerg Dis 68:3257–3276. doi:10.1111/tbed.13948 - DOI - PubMed
    1. Galagan JE. 2014. Genomic insights into tuberculosis. Nat Rev Genet 15:307–320. doi:10.1038/nrg3664 - DOI - PubMed
    1. Brites D, Loiseau C, Menardo F, Borrell S, Boniotti MB, Warren R, Dippenaar A, Parsons SDC, Beisel C, Behr MA, Fyfe JA, Coscolla M, Gagneux S. 2018. A new phylogenetic framework for the animal-adapted Mycobacterium tuberculosis complex. Front Microbiol 9:2820. doi:10.3389/fmicb.2018.02820 - DOI - PMC - PubMed

Substances

LinkOut - more resources