Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 8;10(7):e0130912.
doi: 10.1371/journal.pone.0130912. eCollection 2015.

Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm

Affiliations

Genomics and Machine Learning for Taxonomy Consensus: The Mycobacterium tuberculosis Complex Paradigm

Jérôme Azé et al. PLoS One. .

Abstract

Infra-species taxonomy is a prerequisite to compare features such as virulence in different pathogen lineages. Mycobacterium tuberculosis complex taxonomy has rapidly evolved in the last 20 years through intensive clinical isolation, advances in sequencing and in the description of fast-evolving loci (CRISPR and MIRU-VNTR). On-line tools to describe new isolates have been set up based on known diversity either on CRISPRs (also known as spoligotypes) or on MIRU-VNTR profiles. The underlying taxonomies are largely concordant but use different names and offer different depths. The objectives of this study were 1) to explicit the consensus that exists between the alternative taxonomies, and 2) to provide an on-line tool to ease classification of new isolates. Genotyping (24-VNTR, 43-spacers spoligotypes, IS6110-RFLP) was undertaken for 3,454 clinical isolates from the Netherlands (2004-2008). The resulting database was enlarged with African isolates to include most human tuberculosis diversity. Assignations were obtained using TB-Lineage, MIRU-VNTRPlus, SITVITWEB and an algorithm from Borile et al. By identifying the recurrent concordances between the alternative taxonomies, we proposed a consensus including 22 sublineages. Original and consensus assignations of the all isolates from the database were subsequently implemented into an ensemble learning approach based on Machine Learning tool Weka to derive a classification scheme. All assignations were reproduced with very good sensibilities and specificities. When applied to independent datasets, it was able to suggest new sublineages such as pseudo-Beijing. This Lineage Prediction tool, efficient on 15-MIRU, 24-VNTR and spoligotype data is available on the web interface "TBminer." Another section of this website helps summarizing key molecular epidemiological data, easing tuberculosis surveillance. Altogether, we successfully used Machine Learning on a large dataset to set up and make available the first consensual taxonomy for human Mycobacterium tuberculosis complex. Additional developments using SNPs will help stabilizing it.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Relative prevalence of main M. tuberculosis complex lineages in the Netherlands (2005–2008).
Fig 2
Fig 2. Concordance of existing classifications with the consensus classification proposed in this study.
Fig 3
Fig 3. TBminer Lineage Prediction tool: the output file.
Fig 4
Fig 4. TBminer Prediction tool performance on Miru-VntrPlus database.
A. Concordance between TBminer Pred2_Miru-Vntr and Miru-VntrPlus assignations. B. Concordance between Pred6 and manual expert assignation accounting for original labels.
Fig 5
Fig 5. TBminer Prediction tool performance on a Pakistanis sample.
Consensus Lineage Prediction tool of TBminer was compared to the Expert assignation on an independent dataset from Pakistan.
Fig 6
Fig 6. Approach for consensus building between conflictive taxonomies.

Similar articles

Cited by

References

    1. Wayne LG (1984) Mycobacterial speciation In: Wayne GPKaLG, editor. The mycobacteria: asourcebook, Par A. New York: Marcel Dekker Inc; pp. 25–65.
    1. Stackebrandt E, Goebel BM (1997) Taxonomic note: a place of DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Bact 44: 846–849.
    1. Stackebrandt E, Goebel BM (1994) Taxonomic Note: A Place for DNA-DNA Reassociation and 16s rRNA Sequence Analysis in the Present Species Definition in Bacteriology. Int J Syst Bact 44: 846–849.
    1. Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57: 81–91. - PubMed
    1. Lehmann K, R N (1907) Lehmann's Medizin Handatlanten X. Atlas and Grundriss der Bakteriologie une Lehrbuch der speciellen backteriologischen Diagnostik; Lehmann JF, editor. Munich.

Publication types