Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 17;9(1):4306.
doi: 10.1038/s41467-018-06634-y.

Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance

Affiliations

Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance

Erol S Kavvas et al. Nat Commun. .

Abstract

Mycobacterium tuberculosis is a serious human pathogen threat exhibiting complex evolution of antimicrobial resistance (AMR). Accordingly, the many publicly available datasets describing its AMR characteristics demand disparate data-type analyses. Here, we develop a reference strain-agnostic computational platform that uses machine learning approaches, complemented by both genetic interaction analysis and 3D structural mutation-mapping, to identify signatures of AMR evolution to 13 antibiotics. This platform is applied to 1595 sequenced strains to yield four key results. First, a pan-genome analysis shows that M. tuberculosis is highly conserved with sequenced variation concentrated in PE/PPE/PGRS genes. Second, the platform corroborates 33 genes known to confer resistance and identifies 24 new genetic signatures of AMR. Third, 97 epistatic interactions across 10 resistance classes are revealed. Fourth, detailed structural analysis of these genes yields mechanistic bases for their selection. The platform can be used to study other human pathogens.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Identification of key resistance-conferring genes using mutual information. The pairwise mutual information (vertical axis) between the pan-genome alleles and antibiotic resistance was calculated across all possible pairs. The listed genes correspond to the pan-genome alleles that hold the most information about the listed drug’s AMR phenotype
Fig. 2
Fig. 2
Allele co-occurrence tables of correlated AMR genes. Co-occurrence of epistatic genes identified in a ethambutol and b isoniazid. For the rows on the bottom and on the far right, #R refers to the total number of strains that have the allele and are resistant to the specific drug. Total refers to the total number of strains that have that allele that were tested on that specific drug. Each cell is colored by the log odds ratio (LOR) with respect to the AMR phenotype. The numbers in the bottom right of each allele co-occurrence box describes the number of unique sublineages comprised by the strains with both alleles (Methods). The alleles enclosed by a purple box represent those chosen as features by the support vector machine (SVM). Note that in some cases the rows and columns do not sum up to the total strains due to rare cases when strains lack those alleles (Methods)
Fig. 3
Fig. 3
3D and annotated protein structure mutation maps for identified AMR genes. a 3D protein structures with mapped mutations are shown for inhA, embR, and oxcA. The colors adjacent to and within the structural mutation table correspond to domains and mutations displayed on the protein structure, respectively. b Mutation tables for seven new AMR genes. The colors in the mutation table correspond to the incidence of an annotated structural feature located below the table. The two rows directly below the mutation table are colored according to the log odds ratio between the allele frequency and AMR phenotype. Two AMR classes are shown for Rv3471c and Rv3041c

References

    1. Davis JJ, et al. Antimicrobial resistance prediction in PATRIC and RAST. Sci. Rep. 2016;6:27930. doi: 10.1038/srep27930. - DOI - PMC - PubMed
    1. Manson AL, et al. Genomic analysis of globally diverse Mycobacterium tuberculosis strains provides insights into the emergence and spread of multidrug resistance. Nat. Genet. 2017;49:395–402. doi: 10.1038/ng.3767. - DOI - PMC - PubMed
    1. Walker TM, et al. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect. Dis. 2015;15:1193–1202. doi: 10.1016/S1473-3099(15)00062-6. - DOI - PMC - PubMed
    1. Farhat MR, et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat. Genet. 2013;45:1183–1189. doi: 10.1038/ng.2747. - DOI - PMC - PubMed
    1. Desjardins CA, et al. Genomic and functional analyses of Mycobacterium tuberculosis strains implicate ald in d-cycloserine resistance. Nat. Genet. 2016;48:544–551. doi: 10.1038/ng.3548. - DOI - PMC - PubMed

Publication types