Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug:72:59-66.
doi: 10.1016/j.meegid.2018.06.029. Epub 2018 Jun 28.

Towards better prediction of Mycobacterium tuberculosis lineages from MIRU-VNTR data

Affiliations

Towards better prediction of Mycobacterium tuberculosis lineages from MIRU-VNTR data

Nithum Thain et al. Infect Genet Evol. 2019 Aug.

Abstract

The determination of lineages from strain-based molecular genotyping information is an important problem in tuberculosis. Mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) typing is a commonly used molecular genotyping approach that uses counts of the number of times pre-specified loci repeat in a strain. There are three main approaches for determining lineage based on MIRU-VNTR data - one based on a direct comparison to the strains in a curated database, and two others, on machine learning algorithms trained on a large collection of labeled data. All existing methods have limitations. The direct approach imposes an arbitrary threshold on how much a database strain can differ from a given one to be informative. On the other hand, the machine learning-based approaches require a substantial amount of labeled data. Notably, all three methods exhibit suboptimal classification accuracy without additional data. We explore several computational approaches to address these limitations. First, we show that eliminating the arbitrary threshold improves the performance of the direct approach. Second, we introduce RuleTB, an alternative direct method that proposes a concise set of rules for determining lineages. Lastly, we propose StackTB, a machine learning approach that requires only a fraction of the training data to outperform the accuracy of both existing machine learning methods. Our approaches demonstrate superior performance on a training dataset collected in New York City over 10 years, and the improvement in performance translates to a held-out testing set. We conclude that our methods provide opportunities for improving the determination of pathogenic lineages based on MIRU-VNTR data.

Keywords: Interpretability; Lineage; MIRU-VNTR; Machine learning; Mycobacterium tuberculosis.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
A schematic illustration of the StackTB classifier.

References

    1. Hershberg R, Lipatov M, Small PM, et al. (2008) High Functional Diversity in Mycobacterium tuberculosis Driven by Genetic Drift and Human Demography. PLoS Biol 6(12): e311. doi:10.1371/journal.pbio.0060311. - DOI - PMC - PubMed
    1. Gagneux S (2012) Host-pathogen coevolution in human tuberculosis. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 367, 850859. - PMC - PubMed
    1. Warner DF, Koch A, Mizrahi V (2015). Diversity and disease pathogenesis in Mycobacterium tuberculosis. Trends Microbiol. 23(1):14–21. - PubMed
    1. Reed MB, Pichler VK, McIntosh F, et al. (2009). Major Mycobacterium tuberculosis Lineages Associate with Patient Country of Origin. Journal of Clinical Microbiology. 47(4):1119–1128. - PMC - PubMed
    1. Merker M, Blin C, Mona S, et al. (2015). Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage Nature Genetics. 47(3):242–9. - PMC - PubMed

Publication types

LinkOut - more resources