Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 20;180(4):688-702.e13.
doi: 10.1016/j.cell.2020.01.021.

A Deep Learning Approach to Antibiotic Discovery

Affiliations

A Deep Learning Approach to Antibiotic Discovery

Jonathan M Stokes et al. Cell. .

Erratum in

  • A Deep Learning Approach to Antibiotic Discovery.
    Stokes JM, Yang K, Swanson K, Jin W, Cubillos-Ruiz A, Donghia NM, MacNair CR, French S, Carfrae LA, Bloom-Ackermann Z, Tran VM, Chiappino-Pepe A, Badran AH, Andrews IW, Chory EJ, Church GM, Brown ED, Jaakkola TS, Barzilay R, Collins JJ. Stokes JM, et al. Cell. 2020 Apr 16;181(2):475-483. doi: 10.1016/j.cell.2020.04.001. Cell. 2020. PMID: 32302574 No abstract available.

Abstract

Due to the rapid emergence of antibiotic-resistant bacteria, there is a growing need to discover new antibiotics. To address this challenge, we trained a deep neural network capable of predicting molecules with antibacterial activity. We performed predictions on multiple chemical libraries and discovered a molecule from the Drug Repurposing Hub-halicin-that is structurally divergent from conventional antibiotics and displays bactericidal activity against a wide phylogenetic spectrum of pathogens including Mycobacterium tuberculosis and carbapenem-resistant Enterobacteriaceae. Halicin also effectively treated Clostridioides difficile and pan-resistant Acinetobacter baumannii infections in murine models. Additionally, from a discrete set of 23 empirically tested predictions from >107 million molecules curated from the ZINC15 database, our model identified eight antibacterial compounds that are structurally distant from known antibiotics. This work highlights the utility of deep learning approaches to expand our antibiotic arsenal through the discovery of structurally distinct antibacterial molecules.

Keywords: antibiotic resistance; antibiotic tolerance; antibiotics; drug discovery; machine learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests J.J.C. is scientific co-founder and SAB chair of EnBiotix, an antibiotic drug discovery company.

Figures

Figure 1.
Figure 1.. Machine learning in antibiotic discovery.
Modern approaches to antibiotic discovery often include screening large chemical libraries for those that elicit a phenotype of interest. These screens, which are upper bound by hundreds of thousands to a few million molecules, are expensive, time consuming, and can fail to capture an expansive breadth of chemical space. In contrast, machine learning approaches afford the opportunity to rapidly and inexpensively explore vast chemical spaces in silico. Our deep neural network model works by building a molecular graph based on a specific property, in our case the inhibition of the growth of E. coli, using a directed message passing approach. We first trained our neural network model using a collection of 2,335 diverse molecules for those that inhibited the growth of E. coli, augmenting the model with a set of molecular features, hyperparameter optimization, and ensembling. Next, we applied the model to multiple chemical libraries, comprising >107 million molecules, to identify potential lead compounds with activity against E. coli. After ranking the candidates according to the model’s predicted score, we selected a list of promising candidates.
Figure 2.
Figure 2.. Initial model training and the identification of halicin.
(A) Primary screening data for growth inhibition of E. coli by 2,560 molecules within the FDA-approved drug library supplemented with a natural product collection. Shown is the mean of two biological replicates. Red are growth inhibitory molecules; blue are non-growth inhibitory molecules. (B) ROC-AUC plot evaluating model performance after training. Dark blue is the mean of six individual trials (cyan). (C) Rank-ordered prediction scores of Drug Repurposing Hub molecules that were not present in the training dataset. (D) The top 99 predictions from the data shown in (C) were curated for empirical testing for growth inhibition of E. coli. Fifty-one of 99 molecules were validated as true positives based on a cut-off of OD600 < 0.2. Shown is the mean of two biological replicates. Red are growth inhibitory molecules; blue are non-growth inhibitory molecules. (E) For all molecules shown in (D), ratios of OD600 to prediction score were calculated and these values were plotted based on prediction score for each corresponding molecule. These results show that a higher prediction score correlates with a greater probability of growth inhibition. (F) The bottom 63 predictions from the data shown in (C) were curated for empirical testing for growth inhibition of E. coli. Shown is the mean of two biological replicates. Red are growth inhibitory molecules; blue are non-growth inhibitory molecules. (G) t-SNE of all molecules from the training dataset (blue) and the Drug Repurposing Hub (red), revealing chemical relationships between these libraries. Halicin is shown as a black and yellow circle. (H) Tanimoto similarity between halicin (structure inset) and each molecule in the de-duplicated training dataset. The Tanimoto nearest neighbour is the antiprotozoal drug nithiamide (score ~0.37), with metronidazole being the nearest antibiotic (score ~0.21). (I) Growth inhibition of E. coli by halicin. Shown is the mean of two biological replicates. Bars denote absolute error. See also Figure S1, Table S1, S2.
Figure 3.
Figure 3.. Halicin is a broad-spectrum bactericidal antibiotic.
(A) Killing of E. coli in LB media in the presence of varying concentrations of halicin after 1 hr (blue), 2 hr (cyan), 3 hr (green), and 4 hr (red). The initial cell density is ~106 CFU/ml. Shown is the mean of two biological replicates. Bars denote absolute error. (B) Killing of E. coli in PBS in the presence of varying concentrations of halicin after 2 hr (blue), 4 hr (cyan), 6 hr (green), and 8 hr (red). The initial cell density is ~106 CFU/ml. Shown is the mean of two biological replicates. Bars denote absolute error. (C) Killing of E. coli persisters by halicin after treatment with 10 µg/ml (10x MIC) of ampicillin. Light blue is no halicin. Green is 5x MIC halicin. Blue is 10x MIC halicin. Red is 20x MIC halicin. Shown is the mean of two biological replicates. Bars denote absolute error. (D) MIC of halicin against E. coli strains harboring a range of antibiotic-resistance determinants. The mcr-1 gene was expressed in E. coli BW25113. All other resistance genes were expressed in E. coli BW25113 ∆bamB∆tolC. Experiments were conducted with two biological replicates. (E) Growth inhibition of M. tuberculosis by halicin. Shown is the mean of three biological replicates. Bars denote standard deviation. (F) Killing of M. tuberculosis by halicin in 7H9 media at 16 µg/ml (1x MIC). Shown is the mean of three biological replicates. Bars denote standard deviation. (G) MIC of halicin against 36-strain panels of CRE isolates (green), A. baumannii isolates (red), and P. aeruginosa isolates (blue). Experiments were conducted with two biological replicates. See also Figure S2, Table S3.
Figure 4.
Figure 4.. Halicin dissipates the ∆pH component of the proton motive force.
(A) Evolution of resistance to halicin (blue) or ciprofloxacin (red) in E. coli after 30 days of passaging in liquid LB media. Cells were passaged every 24 hours. (B) Whole transcriptome hierarchical clustering of relative gene expression of E. coli treated with halicin at 4x MIC for 1 hr, 2 hr, 3 hr, and 4 hr. Shown is the mean transcript abundance of two biological replicates of halicin-treated cells relative to untreated control cells on a log2-fold scale. Genes enriched in cluster b are involved in locomotion (p~10−20); genes enriched in cluster c are involved in ribosome structure/function (p~10−30); and genes enriched in cluster d are involved in membrane protein complexes (p~10−15). Clusters a, e, and f are not highly enriched for specific biological functions. In the growth curve, blue represents untreated cells; red represents halicin-treated cells. (C) Growth inhibition by halicin against E. coli in pH-adjusted media. Shown is the mean of two biological replicates. Bars denote absolute error. (D) DiSC3(5) fluorescence in E. coli upon exposure to polymyxin B (PMB), halicin, or DMSO. Growth inhibition checkerboards of halicin in combination with tetracycline (left), kanamycin (center), and FeCl3 (right). Dark blue represents greater growth. See also Figure S3, Table S4.
Figure 5.
Figure 5.. Halicin displays efficacy in murine models of infection.
(A) Growth inhibition of pan-resistant A. baumannii CDC 288 by halicin. Shown is the mean of two biological replicates. Bars denote absolute error. (B) Killing of A. baumannii CDC 288 in PBS in the presence of varying concentrations of halicin after 2 hr (blue), 4 hr (cyan), 6 hr (green), and 8 hr (red). The initial cell density is ~108 CFU/ml. Shown is the mean of two biological replicates. Bars denote absolute error. (C) In a wound infection model, mice were infected with A. baumannii CDC 288 for 1 hr and treated with either vehicle (green; 0.5% DMSO; n=6) or halicin (blue; 0.5% w/v; n=6) over 24 hr. Bacterial load from wound tissue after treatment was determined by selective plating. Black lines represent geometric mean of the bacterial load for each treatment group. (D) Growth inhibition of C. difficile 630 by halicin. Shown is the mean of two biological replicates. Bars denote absolute error. (E) Experimental design for C. difficile infection and treatment. (F) Bacterial load of C. difficile 630 in feces of infected mice. Metronidazole (red; 50 mg/kg; n=6) did not result in enhanced rates of clearance relative to vehicle controls (green; 10% PEG 300; n=7). Halicin-treated mice (blue; 15 mg/kg; n=4) displayed sterilization beginning at 72 hr after treatment, with 100% of mice being free of infection at 96 hr after treatment. Lines represent geometric mean of the bacterial load for each treatment group. See also Figure S4.
Figure 6.
Figure 6.. Predicting new antibiotic candidates from unprecedented chemical libraries.
(A) Tranches of the ZINC15 database colored based on the proportion of hits from the original training dataset of 2,335 molecules within each tranche. Darker blue tranches have a higher proportion of molecules that are growth inhibitory against E. coli. Yellow tranches are those selected for predictions. (B) Histogram showing the number of ZINC15 molecules from selected tranches within a corresponding prediction score range. (C) Prediction scores and Tanimoto nearest neighbour antibiotic scores of the 23 predictions that were empirically tested for growth inhibition. Yellow circles represent those molecules that displayed detectable growth inhibition of at least one pathogen. Grey circles represent inactive molecules. ZINC numbers of active molecules are shown on the right. (D) MIC values (µg/ml) of the eight active predictions from the ZINC15 database against E. coli (EC), S. aureus (SA), K. pneumoniae (KP), A. baumannii (AB), and P. aeruginosa (PA). Blank regions represent no detectable growth inhibition at 128 µg/ml. Structures are shown in the same order (top to bottom) as their corresponding ZINC numbers in (C). (E) MIC of ZINC000100032716 against E. coli strains harboring a range of antibiotic-resistance determinants. The mcr-1 gene was expressed in E. coli BW25113. All other resistance genes were expressed in E. coli BW25113 ∆bamB∆tolC. Experiments were conducted with two biological replicates. Note the minor increase in MIC in the presence of aac(6’)-Ib-cr. (F) Same as (E) except using ZINC000225434673. (G) Killing of E. coli in LB media in the presence of varying concentrations of ZINC000100032716 after 0 hr (blue) and 4 hr (red). The initial cell density is ~106 CFU/ml. Shown is the mean of two biological replicates. Bars denote absolute error. (H) Same as (G) except using ZINC000225434673. (I) t-SNE of all molecules from the primary training dataset (blue), the Drug Repurposing Hub (red), the WuXi anti-tuberculosis library (green), the ZINC15 molecules with prediction scores >0.9 (pink), false positive predictions (grey), and true positive predictions (yellow). See also Figure S5, Table S5–S7.

Comment in

  • Parsing Molecules for Drug Discovery.
    Walker AS, Pishchany G, Clardy J. Walker AS, et al. Biochemistry. 2020 May 5;59(17):1645-1646. doi: 10.1021/acs.biochem.0c00278. Epub 2020 Apr 21. Biochemistry. 2020. PMID: 32315170 Free PMC article. No abstract available.

References

    1. Abeel T, Van Parys T, Saeys Y, Galagan J, Van de Peer Y, 2012. GenomeView: a next-generation genome browser. Nucleic Acids Res 40, e12. - PMC - PubMed
    1. Angus BL, Carey AM, Caron DA, Kropinski AM, Hancock RE, 1982. Outer membrane permeability in Pseudomonas aeruginosa: comparison of a wild-type with an antibiotic-supersusceptible mutant. Antimicrob. Agents Chemother 21, 299–309. - PMC - PubMed
    1. Balaban NQ, Helaine S, Lewis K, Ackermann M, Aldridge B, Andersson DI, Brynildsen MP, Bumann D, Camilli A, Collins JJ, Dehio C, Fortune S, Ghigo JM, Hardt WD, Harms A, Heinemann M, Hung DT, Jenal U, Levin BR, Michiels J, Storz G, Tan MW, Tenson T, Van Melderen L, Zinkernagel A, 2019. Definitions and guidelines for research on antibiotic persistence. Nat. Rev. Microbiol 17, 441–448. - PMC - PubMed
    1. Brown DG, May-Dracka TL, Gagnon MM, Tommasi R, 2014. Trends and exceptions of physical properties on antibacterial activity for Gram-positive and Gram-negative pathogens. J. Med. Chem 57, 10144–10161. - PubMed
    1. Brown ED, Wright GD, 2016. Antibacterial drug discovery in the resistance era. Nature 529, 336–343. - PubMed

Publication types

MeSH terms