Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 4:13:1076797.
doi: 10.3389/fmicb.2022.1076797. eCollection 2022.

Pan-genome association study of Mycobacterium tuberculosis lineage-4 revealed specific genes related to the high and low prevalence of the disease in patients from the North-Eastern area of Medellín, Colombia

Affiliations

Pan-genome association study of Mycobacterium tuberculosis lineage-4 revealed specific genes related to the high and low prevalence of the disease in patients from the North-Eastern area of Medellín, Colombia

Uriel Hurtado-Páez et al. Front Microbiol. .

Abstract

Mycobacterium tuberculosis (Mtb) lineage 4 is responsible for the highest burden of tuberculosis (TB) worldwide. This lineage has been the most prevalent lineage in Colombia, especially in the North-Eastern (NE) area of Medellin, where it has been shown to have a high prevalence of LAM9 SIT42 and Haarlem1 SIT62 sublineages. There is evidence that regardless of environmental factors and host genetics, differences among sublineages of Mtb strains play an important role in the course of infection and disease. Nevertheless, the genetic basis of the success of a sublineage in a specific geographic area remains uncertain. We used a pan-genome-wide association study (pan-GWAS) of 47 Mtb strains isolated from NE Medellin between 2005 and 2008 to identify the genes responsible for the phenotypic differences among high and low prevalence sublineages. Our results allowed the identification of 12 variants in 11 genes, of which 4 genes showed the strongest association to low prevalence (mmpL12, PPE29, Rv1419, and Rv1762c). The first three have been described as necessary for invasion and intracellular survival. Polymorphisms identified in low prevalence isolates may suggest related to a fitness cost of Mtb, which might reflect a decrease in their capacity to be transmitted or to cause an active infection. These results contribute to understanding the success of some sublineages of lineage-4 in a specific geographical area.

Keywords: M. tuberculosis; pan-GWAS; pan-genome; prevalence; sublineage; transmission; variant.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Mycobacterium tuberculosis (Mtb) pan-genome area of lineage-4. Global composition of gene clusters divided into four compartments.
FIGURE 2
FIGURE 2
Occupancy of genomes in gene clusters. Every compartment or class is depicted in a different color. The X-axis shows the occupancy, which is the number of genomes that are contained in a given number of clusters of genes.
FIGURE 3
FIGURE 3
Pan-genome flower. It shows all Mtb lineage-4 isolates that make up the pan-genome. In the center the number of core genes is observed, the second clear circle shows the accessory genes, and the petals show the number of specific genes of each isolate in the 47 genomes. The numbers below each isolate denote the total number of related CDSs.
FIGURE 4
FIGURE 4
Composition analysis of the core and the pan-genome of Mtb L4 from 47 genomes. (A) Each point on the Y-axis indicates the number of gene clusters after adding a new genome in randomized simulations. The line red indicates the exponential decay as a function of the average values of the clusters each time a genome was added to the analysis. (B) Pan-genome growth simulation by counting new genes added by the last genome sampled. Note that sequences matching a previously seen gene coverage ≥20% will be considered homologous and thus won’t be considered new. An open pan genome model is observed.
FIGURE 5
FIGURE 5
General KEGG pathways. It shows the differences in percentage of the functional annotations at the highest hierarchical level between the genes highly conserved of the core (red), genes moderately conserved of the dispensable genome (green) and genes exclusive to each isolate or unique (gray).
FIGURE 6
FIGURE 6
Primary KEGG pathways. It shows the differences in percentage of the functional annotations among the genes highly conserved of the core (red), where the highest number of assignments for each category is observed. Genes moderately conserved of the dispensable genome (green) are mainly related to adaptation to the environment-host and genes exclusive to each isolate or unique (gray).
FIGURE 7
FIGURE 7
Phylogeny of the Mtb L4 pan-genome by maximum likelihood. IQ-TREE was used to estimate the phylogenetic tree from the consensus of the 4,846 gene clusters produced by both the COG and OMCL algorithms. The node supports after 1000 bootstraps is shown on the branches. The outgroup is M. canetti. Three major clades were observed. Lilac; all the branches of the isolates, except UT277, coincide with the Haarlem1-SIT62 sublineage considered to be of high prevalence. Purple; the branches correspond mainly to Haarlem1-SIT45 and Haarlem3-SIT50 isolates, and less frequently to Haarlem1 and Haarlem3 with variable SIT. Green; a clade was observed mixed with branches of isolates considered to be of high prevalence, mostly LAM9 SIT42, and four branches of low prevalence belonging to the LAM sublineages. The red branches correspond to isolates with high prevalence and the pink branches correspond to all isolates with low prevalence in the North-Eastern zone of Medellín.
FIGURE 8
FIGURE 8
Topology comparison trees of Mtb L4 were constructed by maximum likelihood using different molecular markers. IQ-TREE was used to estimate the phylogeny with node supports after a bootstrap of 1000 replicas. (A) Consensus of gene clusters of the pan-genome (4,846 genes). (B) Concatenation of 34,999 SNPs of the core genome. In both, three main clades were observed, with very similar topologies and distribution of the isolates in each of the branches, despite having used different markers as an approximation.

Similar articles

Cited by

References

    1. Agranoff D., Krishna S. (2004). Metal ion transport and regulation in Mycobacterium tuberculosis. Front. Biosci. 9 2996–3006. 10.2741/1454 - DOI - PubMed
    1. Almanza R., Montes F., González D., Zapata S. (2019). Situación de la tuberculosis en medellín 2018. Secretaría de Salud de Medellín: Boletin epidemiológico.
    1. Altschul S., Gish W., Miller W., Myers E., Lipman D. (1990). Basic local alignment search tool. J. Mol. Biol. 215 403–410. 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Andrews S. (2010). FastQC a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (accessed July, 2017).
    1. Apweiler R. (2004). UniProt: The universal protein knowledgebase. Nucleic Acids Res. 32:115D–119D. 10.1093/nar/gkh131 - DOI - PMC - PubMed

LinkOut - more resources