Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm
- PMID: 26154264
- PMCID: PMC4496040
- DOI: 10.1371/journal.pone.0130912
Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm
Abstract
Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface "TBminer." Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual taxonomy for human Mycobacterium tuberculosis complex. Additional developments using SNPs will help stabilizing it.
Conflict of interest statement
Figures






Similar articles
-
Genetic Diversity of Mycobacterium tuberculosis Isolates from Assam, India: Dominance of Beijing Family and Discovery of Two New Clades Related to CAS1_Delhi and EAI Family Based on Spoligotyping and MIRU-VNTR Typing.PLoS One. 2015 Dec 23;10(12):e0145860. doi: 10.1371/journal.pone.0145860. eCollection 2015. PLoS One. 2015. PMID: 26701129 Free PMC article.
-
Evaluation and strategy for use of MIRU-VNTRplus, a multifunctional database for online analysis of genotyping data and phylogenetic identification of Mycobacterium tuberculosis complex isolates.J Clin Microbiol. 2008 Aug;46(8):2692-9. doi: 10.1128/JCM.00540-08. Epub 2008 Jun 11. J Clin Microbiol. 2008. PMID: 18550737 Free PMC article.
-
One year nationwide evaluation of 24-locus MIRU-VNTR genotyping on Slovenian Mycobacterium tuberculosis isolates.Respir Med. 2011 Oct;105 Suppl 1:S67-73. doi: 10.1016/S0954-6111(11)70014-2. Respir Med. 2011. PMID: 22015090
-
[New era in molecular epidemiology of tuberculosis in Japan].Kekkaku. 2006 Nov;81(11):693-707. Kekkaku. 2006. PMID: 17154049 Review. Japanese.
-
Molecular typing of Mycobacterium tuberculosis: a review of current methods, databases, softwares, and analytical tools.FEMS Microbiol Rev. 2025 Jan 14;49:fuaf017. doi: 10.1093/femsre/fuaf017. FEMS Microbiol Rev. 2025. PMID: 40287399 Free PMC article. Review.
Cited by
-
Performance of the T-SPOT.TB test in patients with indeterminate QuantiFERON-TB Gold Plus results: proposal for an algorithm for the diagnosis of Latent Tuberculosis Infection.Infez Med. 2024 Dec 1;32(4):525-531. doi: 10.53854/liim-3204-11. eCollection 2024. Infez Med. 2024. PMID: 39660151 Free PMC article.
-
The shortcut of mycobacterial interspersed repetitive unit-variable number tandem repeat typing for Mycobacterium tuberculosis differentiation.Front Microbiol. 2022 Sep 8;13:978355. doi: 10.3389/fmicb.2022.978355. eCollection 2022. Front Microbiol. 2022. PMID: 36160200 Free PMC article.
-
A guide to machine learning for bacterial host attribution using genome sequence data.Microb Genom. 2019 Dec;5(12):e000317. doi: 10.1099/mgen.0.000317. Microb Genom. 2019. PMID: 31778355 Free PMC article.
-
Diversity of Mycobacterium tuberculosis in the Middle Fly District of Western Province, Papua New Guinea: microbead-based spoligotyping using DNA from Ziehl-Neelsen-stained microscopy preparations.Sci Rep. 2019 Oct 29;9(1):15549. doi: 10.1038/s41598-019-51892-5. Sci Rep. 2019. PMID: 31664101 Free PMC article.
-
Genetic diversities of Mycobacterium tuberculosis complex species in Western Kenya.Access Microbiol. 2024 Feb 20;6(2):000729.v3. doi: 10.1099/acmi.0.000729.v3. eCollection 2024. Access Microbiol. 2024. PMID: 38482360 Free PMC article.
References
-
- Wayne LG (1984) Mycobacterial speciation In: Wayne GPKaLG, editor. The mycobacteria: asourcebook, Par A. New York: Marcel Dekker Inc; pp. 25–65.
-
- Stackebrandt E, Goebel BM (1997) Taxonomic note: a place of DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Bact 44: 846–849.
-
- Stackebrandt E, Goebel BM (1994) Taxonomic Note: A Place for DNA-DNA Reassociation and 16s rRNA Sequence Analysis in the Present Species Definition in Bacteriology. Int J Syst Bact 44: 846–849.
-
- Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57: 81–91. - PubMed
-
- Lehmann K, R N (1907) Lehmann's Medizin Handatlanten X. Atlas and Grundriss der Bakteriologie une Lehrbuch der speciellen backteriologischen Diagnostik; Lehmann JF, editor. Munich.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical