Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 8;10(4):e0122979.
doi: 10.1371/journal.pone.0122979. eCollection 2015.

Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome

Affiliations

Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome

Vinita Periwal et al. PLoS One. .

Abstract

The tubercle complex consists of closely related mycobacterium species which appear to be variants of a single species. Comparative genome analysis of different strains could provide useful clues and insights into the genetic diversity of the species. We integrated genome assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC), which included 8 Indian clinical isolates sequenced and assembled in this study, to understand its pangenome architecture. We predicted genes for all the 96 strains and clustered their respective CDSs into homologous gene clusters (HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC. The hard-core (HGCs shared amongst 100% of the strains) was comprised of 2,066 gene clusters whereas the soft-core (HGCs shared amongst at least 95% of the strains) comprised of 3,374 gene clusters. The change in the core and accessory genome components when observed as a function of their size revealed that MTBC has an open pangenome. We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates. We report PCR validation on 9 candidate genes depicting 7 genes completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology with them accounting to probable insertion and deletion events. The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species. We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Core-gene Tree.
The un-rooted MTBC tree was created from alignment of 971 orthologous core-genes from 96 strains. A tree with Bootstrapped values is presented in S3 Fig (details in text).
Fig 2
Fig 2. Core and accessory genome size evolution.
(A) Each point indicates the number of HGCs conserved in a genome. The red line indicates an exponential decay function based on the median values of core HGCs when each time a new genome is added to the analysis. (B) Accessory genome of MTBC. The MTBC has an open pangenome model.
Fig 3
Fig 3. Summary of sequence annotation statistics from BLAST2GO.
Representative sequences from all the 8,099 HGCs were subjected to annotation out of which 47.77% (3,869) sequences were annotated with GO slim terms, 26.63% (2,157) sequences were without any BLAST hits, 23.43% (1,898) sequences had only blast results but didn’t had annotation and 1.65% (134) sequences retrieved mapping results but were without GO slim terms. A small fraction of 0.5% (41) sequences failed to fetch BLAST results.
Fig 4
Fig 4. Molecular function GO annotations of the soft-core component.
GO annotation for Biological process and Cellular component is provided in S5 and S6 Figs.
Fig 5
Fig 5. The accessory genome of MTBC.
The flower plots depict the distribution of accessory genome HGCs across different species of MTBC. (A) Flower plot showing number of accessory HGCs present in Mtb (in center) and number of species-specific genes in the leaves. (B) Number of species-specific genes of M. bovis in leaves and total accessory HGCs in center. (C) M. canettii has accessory HGCs in center and species-specific genes in leaves. (D and E) The genomes of M. africanum and M. orygis have four and five species-specific genes respectively (outer circle) and total accessory HGCs in the center.
Fig 6
Fig 6. Overlap of accessory orthologous clusters shared within each strain pair.
The diagonal represent the number of clusters present in any given strain and divides the data into exactly similar halves. Color key indicates the distribution of clusters. The sharing is irrespective of the cluster being present in any other strain. The minimum number of shared clusters is 109 clusters between Mcan CIPT 140070008 and Mbovis BCG Phipps and maximum (521) is between Mbovis BCG Sweden and Mbovis BCG Prague9.
Fig 7
Fig 7. The heatmap shows 74 HGCs absent from reference strains Mtb ATCC H37Ra and Mtb ATCC H37Rv but present in most of the clinical isolates.
Predicted annotation of each of the cluster family is also represented. The genes validated using PCR are also shown adjacent to the annotations.
Fig 8
Fig 8. (A) PCR validation of the 10 candidate genes (subset of 74 clusters identified) in 8 Indian clinical isolates (OSDD strains) and reference strains ATCC H37Ra and ATCC H37Rv.
Arrow heads indicate the variably sized products in ATCC H37Ra and ATCC H37Rv genomes. Gene 4 and Gene 8 shares partial homology with laboratory strains and showed the presence of a differently sized product in ATCC H37Rv and ATCC H37Ra. (B) The alignment of Gene 4 against reference genomes showed an insertion of 175bp sequence in Gene 4 of OSDD strains with respect to reference strains. (C) The alignment of Gene 8 against reference genomes showed a deletion of 1,352bp sequence in Gene 8 of OSDD strains.

Similar articles

Cited by

References

    1. de Jong BC, Antonio M, Gagneux S (2010) Mycobacterium africanum—review of an important cause of human tuberculosis in West Africa. PLoS Negl Trop Dis 4: e744 doi: 10.1371/journal.pntd.0000744 - DOI - PMC - PubMed
    1. Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, et al. (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393: 537–544. - PubMed
    1. Garnier T, Eiglmeier K, Camus JC, Medina N, Mansoor H, Pryor M, et al. (2003) The complete genome sequence of Mycobacterium bovis. Proc Natl Acad Sci U S A 100: 7877–7882. Epub 2003 Jun 7873. - PMC - PubMed
    1. Brosch R, Gordon SV, Marmiesse M, Brodin P, Buchrieser C, Eiglmeier K, et al. (2002) A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl Acad Sci U S A 99: 3684–3689. Epub 2002 Mar 3612. - PMC - PubMed
    1. van Ingen J, Rahim Z, Mulder A, Boeree MJ, Simeone R, Brosch R, et al. (2012) Characterization of Mycobacterium orygis as M. tuberculosis complex subspecies. Emerg Infect Dis 18: 653–655. doi: 610.3201/eid1804.110888 - PMC - PubMed

Publication types