Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 13;18(10):e1010613.
doi: 10.1371/journal.pcbi.1010613. eCollection 2022 Oct.

A machine learning model trained on a high-throughput antibacterial screen increases the hit rate of drug discovery

Affiliations

A machine learning model trained on a high-throughput antibacterial screen increases the hit rate of drug discovery

A S M Zisanur Rahman et al. PLoS Comput Biol. .

Abstract

Screening for novel antibacterial compounds in small molecule libraries has a low success rate. We applied machine learning (ML)-based virtual screening for antibacterial activity and evaluated its predictive power by experimental validation. We first binarized 29,537 compounds according to their growth inhibitory activity (hit rate 0.87%) against the antibiotic-resistant bacterium Burkholderia cenocepacia and described their molecular features with a directed-message passing neural network (D-MPNN). Then, we used the data to train an ML model that achieved a receiver operating characteristic (ROC) score of 0.823 on the test set. Finally, we predicted antibacterial activity in virtual libraries corresponding to 1,614 compounds from the Food and Drug Administration (FDA)-approved list and 224,205 natural products. Hit rates of 26% and 12%, respectively, were obtained when we tested the top-ranked predicted compounds for growth inhibitory activity against B. cenocepacia, which represents at least a 14-fold increase from the previous hit rate. In addition, more than 51% of the predicted antibacterial natural compounds inhibited ESKAPE pathogens showing that predictions expand beyond the organism-specific dataset to a broad range of bacteria. Overall, the developed ML approach can be used for compound prioritization before screening, increasing the typical hit rate of drug discovery.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Initial training and performance evaluation of the machine learning model.
(A) High-throughput screening data generated by screening a compound library of 29,537 compounds against B. cenocepacia K56-2 wild-type. Using B-score ≤ -17.5 as a threshold, the screening yielded 256 active compounds. Dark blue and red represent inactive and active compounds, respectively. (B) The machine learning model was trained using a D-MPNN approach, which extracts compounds’ local features, such as atom and bond features. The model was fed more than 200 additional global molecular descriptors to further increase the accuracy. Dataset was split into 80:10:10 ratio to train, validate and test the model. (C) ROC-AUC plot evaluating model performance after training. The model attained a ROC-AUC of 0.823. Parts of panel B are modified from Yang et al. [11]. Fig 1B was created with https://biorender.com/.
Fig 2
Fig 2. In vitro testing of top-ranked predicted compounds from an FDA-approved compound library.
(A) Schematic of the screening protocol. Eighty-one commercially available compounds (from the top 100) were screened. (B) The screening identified 21 bioactive compounds with a positive predictive value (PPV) of 25.9%. Dark blue and red represent inactive and active compounds, respectively. (C) The top 100 ranked compounds selected for empirical testing belong to different drug families. Most of the compounds exhibiting bioactivity were known antibiotics or antimicrobial compounds. (D) The ratio of OD600nm and prediction scores were plotted against the predicted rank of the corresponding compounds. The results show a linear correlation (Pearson correlation, R = 0.54) between the prediction score and bioactivity. The predicted score is the probability of a compound being active as calculated by the ML model. The predicted rank is the order of the compounds based on the predicted score, where compounds with the higher predicted scores are ranked higher. The red and blue triangles show the gradient of predicted rank and growth (measured as OD600nm), respectively. Dark blue and red indicate compounds’ probability of being inactive and active, respectively. Results are the average of at least three independent biological replicates. Fig 2A was created with https://biorender.com/.
Fig 3
Fig 3. In vitro testing of top-ranked predicted compounds from an unprecedented natural product library.
(A) The 43 commercially available compounds (from the 100 top ranked unique compounds) were screened against B. cenocepacia K56-2. The screening yielded 5 bioactive compounds with a hit rate 10 times higher than a conventional screening (hit rate = 11.63%). Dark blue and red are non-inhibitory and inhibitory compounds, respectively, based on the residual growth (RG) threshold of 0.8. (B) Screening these 43 compounds against the ESKAPE pathogens yielded 22 bioactive compounds that displayed broad-spectrum growth inhibitory activity against diverse pathogens (Positive predictive value (PPV) = 51.16%). Dark blue and red are non-inhibitory and inhibitory compounds, respectively. The structures of the compounds that exhibited growth inhibitory activity against B. cenocepacia K56-2 and the ESKAPE pathogens are shown beside the plots. Results are average of at least three independent biological replicates. Error bars indicate mean ± SD. AB = A. baumannii 1225, BC = B. cenocepacia K56-2, EC = E. cloacae ENT001_EB001, PA = P. aeruginosa PAO1, MRSA = Methicillin-resistant S. aureus ATCC33592.
Fig 4
Fig 4. Enhanced sensitivity of the CRISPRi knockdown mutants indicated RpoB as the in vivo target of STL558147.
(A) Chemical structures of STL558147 and Rifampicin. (B-D) Comparison of hypersensitive CRISPRi knockdown mutants to novobiocin (B), rifampicin (C) and STL558147 (D). Blue indicates more growth (less inhibition), and red indicates less growth (more inhibition). Results are average of at least three independent biological replicates.
Fig 5
Fig 5. Synergy maps of STL558147 and rifampicin combined with other antibiotics against B. cenocepacia K56-2.
Synergy plots of STL558147 (A) and rifampicin (B) with ceftazidime, colistin, and polymyxin B. The synergy scores were calculated based on the widely used Bliss independence [52] and Loewe additivity [53] models. The most synergistic area in each combination is highlighted with a rectangular box inside the plot. Green (negative δ-scores) indicate antagonistic interactions, and red (positive δ-scores) indicate synergistic interactions. Synergy scores >15, between -5 to 15, and < -15 were considered synergistic, additive and antagonistic, respectively. Results are average of at least three independent biological replicates. Synergy scores are shown as mean ± SEM. Synergy scores were calculated using SynergyFinder 2.0 [31].
Fig 6
Fig 6. Screening of PHAR261659 analogs.
PHAR261659 analogs with different side chains were selected based on lower predicted logP values. STL529920, a stereoisomer of PHAR261659, exhibited growth inhibitory activity against all six pathogens tested. The activity of growth inhibitory and non-growth inhibitory compounds are shown in red and blue, respectively. Results are the average of three independent biological replicates. Error bars indicate mean ± SD.

Similar articles

Cited by

References

    1. CDC. Antibiotic Resistance Threats in the United States. Atlanta, GA: U.S. Department of Health and Human Services, CDC; 2019.
    1. Billington JK. A New Product Development Partnership Model for Antibiotic Resistance. Am J Law Med. 2016;42: 487–523. doi: 10.1177/0098858816658277 - DOI - PubMed
    1. Brown ED, Wright GD. Antibacterial drug discovery in the resistance era. Nature. 2016;529: 336–343. doi: 10.1038/nature17042 - DOI - PubMed
    1. Payne DJ, Gwynn MN, Holmes DJ, Pompliano DL. Drugs for bad bugs: confronting the challenges of antibacterial discovery. Nat Rev Discov. 2007;6: 29–40. doi: 10.1038/nrd2201 - DOI - PubMed
    1. Zgurskaya HI, Lopez CA, Gnanakaran S. Permeability Barrier of Gram-Negative Cell Envelopes and Approaches To Bypass It. ACS Infect Dis. 2015;1: 512–522. doi: 10.1021/acsinfecdis.5b00097 - DOI - PMC - PubMed

Publication types

Substances

Grants and funding