Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 15:2020:baaa108.
doi: 10.1093/database/baaa108.

Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families

Affiliations

Novel methods included in SpolLineages tool for fast and precise prediction of Mycobacterium tuberculosis complex spoligotype families

David Couvin et al. Database (Oxford). .

Abstract

Bioinformatic tools are currently being developed to better understand the Mycobacterium tuberculosis complex (MTBC). Several approaches already exist for the identification of MTBC lineages using classical genotyping methods such as mycobacterial interspersed repetitive units-variable number of tandem DNA repeats and spoligotyping-based families. In the recently released SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe, a large number of spoligotype families were assigned by either manual curation/expertise or using an in-house algorithm. In this study, we present two complementary data-driven approaches allowing fast and precise family prediction from spoligotyping patterns. The first one is based on data transformation and the use of decision tree classifiers. In contrast, the second one searches for a set of simple rules using binary masks through a specifically designed evolutionary algorithm. The comparison with the three main approaches in the field highlighted the good performances of our contributions and the significant runtime gain. Finally, we propose the 'SpolLineages' software tool (https://github.com/dcouvin/SpolLineages), which implements these approaches for MTBC spoligotype families' identification.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Spacer reduction strategy for predictive modelling with DTs.
Figure 2.
Figure 2.
Binary representation for an EA solution.
Figure 3.
Figure 3.
Methodology for comparing classification models.
Figure 4.
Figure 4.
Impact of attribute reduction step on DT performances (a) TP and FP rates and (b) precision.
Figure 5.
Figure 5.
Comparison of runtime in seconds (a) for Expert rules and our approaches and (b) focused only on our approaches.
Figure 6.
Figure 6.
Examples of extracted models for (a) DT and (b) binary masks.

References

    1. Riojas M.A., McGough K.J., Rider-Riojas C.J. et al. (2018) Phylogenomic analysis of the species of the Mycobacterium tuberculosis complex demonstrates that Mycobacterium africanum, Mycobacterium bovis, Mycobacterium caprae, Mycobacterium microti and Mycobacterium pinnipedii are later heterotypic synonyms of Mycobacterium tuberculosis. Int. J. Syst. Evol. Microbiol., 68, 324–332. - PubMed
    1. World Health Organization (WHO) (2019) Global Tuberculosis Report 2019. https://www.who.int/teams/global-tuberculosis-programme/global-report-2019. (accessed March, 16th, 2020)
    1. Supply P., Allix C., Lesjean S. et al. (2006) Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis. J. Clin. Microbiol., 44, 4498–4510. - PMC - PubMed
    1. Kamerbeek J., Schouls L., Kolk A. et al. (1997) Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J. Clin. Microbiol., 35, 907–914. - PMC - PubMed
    1. Stucki D., Brites D., Jeljeli L. et al. (2016) Mycobacterium tuberculosis lineage 4 comprises globally distributed and geographically restricted sublineages. Nat. Genet., 48, 1535–1543. - PMC - PubMed

Publication types