Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 18;6(2):dlae037.
doi: 10.1093/jacamr/dlae037. eCollection 2024 Apr.

Prediction of pyrazinamide resistance in Mycobacterium tuberculosis using structure-based machine-learning approaches

Affiliations

Prediction of pyrazinamide resistance in Mycobacterium tuberculosis using structure-based machine-learning approaches

Joshua J Carter et al. JAC Antimicrob Resist. .

Abstract

Background: Pyrazinamide is one of four first-line antibiotics used to treat tuberculosis; however, antibiotic susceptibility testing for pyrazinamide is challenging. Resistance to pyrazinamide is primarily driven by genetic variation in pncA, encoding an enzyme that converts pyrazinamide into its active form.

Methods: We curated a dataset of 664 non-redundant, missense amino acid mutations in PncA with associated high-confidence phenotypes from published studies and then trained three different machine-learning models to predict pyrazinamide resistance. All models had access to a range of protein structural-, chemical- and sequence-based features.

Results: The best model, a gradient-boosted decision tree, achieved a sensitivity of 80.2% and a specificity of 76.9% on the hold-out test dataset. The clinical performance of the models was then estimated by predicting the binary pyrazinamide resistance phenotype of 4027 samples harbouring 367 unique missense mutations in pncA derived from 24 231 clinical isolates.

Conclusions: This work demonstrates how machine learning can enhance the sensitivity/specificity of pyrazinamide resistance prediction in genetics-based clinical microbiology workflows, highlights novel mutations for future biochemical investigation, and is a proof of concept for using this approach in other drugs.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Distribution of PncA mutations from published datasets. (a) Barplot of the impact of possible missense mutations in PncA by amino acid position. High-confidence resistant (red) and susceptible (blue) mutations are overlaid on the possible missense mutations whose effect on resistance is unknown or unclear (grey). (b) Distribution of the types of mutations reported by the CRyPTIC consortium et al. (c) Missense mutations from the dataset plotted onto the PncA structure (PDB ID: 3PL1) in dark grey. A pyrazinamide molecule (orange) has been modelled into the active site.
Figure 2.
Figure 2.
Structural and evolutionary traits correlate with mutational impact on pyrazinamide susceptibility. (a) Amino acids where >80% of mutations confer resistance are more likely to be found in the core of PncA. (b) There is only a moderate correlation between RaSP and DeepDDG, which both predict the effect of a mutation on protein stability, and MAPP and SNAP2. Resistant and susceptible mutations are plotted in red and blue, respectively. (c) The performance of individual features, as measured by the AUC of a univariable logistic regression. The dashed line denotes random guessing.
Figure 3.
Figure 3.
Machine-learning models predict pyrazinamide resistance from structural, chemical and evolutionary features. Performance of logistic regression (LR), a simple neural network (NN) and gradient-boosted decision tree (XB) models on the (a) Training and (b) Test sets. Error bars represent 95% CIs from bootstrapping (n = 10) and brackets indicate a significant difference (z-test, P < 0.05) (c) Confusion matrices are shown for the Test set. VMEs are considered worse than MEs and hence VMEs and MEs are shaded red and pink, respectively.
Figure 4.
Figure 4.
VMEs are concentrated on the surface of PncA. (a) The majority of VMEs and MEs are shared between the three models. (b) PncA with the corresponding residues highlighted where the shared VMEs (orange) and MEs (blue) are found. (c) The shared VMEs and MEs are predicted to have less and more effect, respectively, on the stability of the protein, as exemplified by DeepDDG and the function of the protein, according to SNAP2.
Figure 5.
Figure 5.
Performance on a real set of clinical samples. (a) Whilst the sensitivity is high, the specificity of the gradient-boosted decision tree model on the Validation dataset is lower than observed on the Test dataset. (b) Removing samples containing a mutation that has an experimentally inconsistent phenotype increases the specificity. As expected, splitting into samples whose mutation either (c) belongs or (d) does not belong to the Train dataset further stratifies performance.

References

    1. WHO . Global Tuberculosis Report 2022. 2022. https://www.who.int/publications/i/item/9789240061729.
    1. Njire M, Tan Y, Mugweru J et al. Pyrazinamide resistance in Mycobacterium tuberculosis: review and update. Adv Med Sci 2016; 61: 63–71. 10.1016/j.advms.2015.09.007 - DOI - PubMed
    1. Zhang Y, Yew WW. Mechanisms of drug resistance in Mycobacterium tuberculosis: update 2015. Int J Tuberc Lung Dis 2015; 19: 1276–89. 10.5588/ijtld.15.0389 - DOI - PubMed
    1. Zhang Y, Mitchison D. The curious characteristics of pyrazinamide: a review. Int J Tuberc Lung Dis 2003; 7: 6–21. - PubMed
    1. Mitchison DA. The action of antituberculosis drugs in short-course chemotherapy. Tubercle 1985; 66: 219–25. 10.1016/0041-3879(85)90040-6 - DOI - PubMed

LinkOut - more resources