Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 21;9(1):5013.
doi: 10.1038/s41598-019-39387-9.

Machine learning-powered antibiotics phenotypic drug discovery

Affiliations

Machine learning-powered antibiotics phenotypic drug discovery

Sannah Zoffmann et al. Sci Rep. .

Abstract

Identification of novel antibiotics remains a major challenge for drug discovery. The present study explores use of phenotypic readouts beyond classical antibacterial growth inhibition adopting a combined multiparametric high content screening and genomic approach. Deployment of the semi-automated bacterial phenotypic fingerprint (BPF) profiling platform in conjunction with a machine learning-powered dataset analysis, effectively allowed us to narrow down, compare and predict compound mode of action (MoA). The method identifies weak antibacterial hits allowing full exploitation of low potency hits frequently discovered by routine antibacterial screening. We demonstrate that BPF classification tool can be successfully used to guide chemical structure activity relationship optimization, enabling antibiotic development and that this approach can be fruitfully applied across species. The BPF classification tool could be potentially applied in primary screening, effectively enabling identification of novel antibacterial compound hits and differentiating their MoA, hence widening the known antibacterial chemical space of existing pharmaceutical compound libraries. More generally, beyond the specific objective of the present work, the proposed approach could be profitably applied to a broader range of diseases amenable to phenotypic drug discovery.

PubMed Disclaimer

Conflict of interest statement

S.Z., M.V., F.M., A.M., L.W., R.B.M., R.S., K.B., A.I.J., C.L., M.B., K.A. and Ma.P. are full-time employees of F. Hoffmann-La Roche A.G., S.Z., M.V., L.W., Mi.P., H.D., R.S., K.B., A.I.J., C.L., M.B. and K.A. own stocks or stock options in F. Hoffmann-La Roche A.G. C.S.M. is. an employee and holder of stock in Summit Therapeutics plc. T.H., H.H.T. and A.A.R. have no potential conflict of interest.

Figures

Figure 1
Figure 1
Antibacterial susceptibility of prioritized hits extracted from the Roche/Genentech compound collection. The histogram shows the potency distribution of 750 compounds active against one or more key Gram negative bacterial species. AB: A. baumannii, EC: E. coli, KP: K. pneumoniae, PA: P. aeruginosa, EC50: effective compound concentration to inhibit growth by 50%.
Figure 2
Figure 2
Bacteria phenotypic modulation is defined by a lowest effective dose. (a) Schematic illustration of LOED calculation method: Dose-dependent response curves for a small subset of nine features selected to represent different types of response profiles are shown to illustrate the process: LOED is indicated as solid vertical bar at the same location in all the panels, and calculated in a consensus approach as the weighted mean of all detected individual LOEDs (dashed vertical lines). Features 1–3 and 7–9 reaches respectively positive – or negative deflection points and individual LOEDs can be derived, while features 4–6 do not contribute, either due to missing curve fit (4&5) or no significant deflection (6). The weights of the contributing features are defined as the goodness of the respective curve fit (variable “w”, top left in each panel). (b) Correlation between MIC and LOED for E. coli WT. Horizontal and vertical stabled lines indicate concentration range tested. Similar value for LOED and MIC is indicated with the diagonal line. Panel (c) Comparison of the antibiotic treatment induced changes to eight selected features quantified with HCS and expressed as fold MAD of non-treated control samples, for two compounds with similar LOED, doxycycline (red) and globomycin (blue). For detailed feature description see Supplementary Table S1 d) E. coli treated with the indicated compound for 1h with a concentration at 4x LOED. Images of the fluorescent stains were acquired with the Opera QEHS reader, 60x magnification. Scale bar corresponds to 10 μm.
Figure 3
Figure 3
ML algorithms handle bacteria phenotypic fingerprint complexity. (a) Compiled changes to individual features, each a change to cell morphology and/or fluorescence stain intensity captured with HCS at 4x LOED induced by a set of reference compounds for E. coli ΔTolC. Each column corresponds to the data from one well and the rows to individual features. The color indicates the change in the feature value, expressed as fold MAD over median calculated for all non-treated samples. For each compound, results from three experiments are shown, with n = 2-3. (b) Corresponding similarity-based projection of the random forest distance matrix into 3 dimensions using multidimensional scaling, showing clear separation between data points belonging to different compounds and close proximity of those belonging to the same compounds, forming clusters for the combined data points at 2x- and 4xLOED. The number of data points are reduced by down sampling to be equal for all reference conditions. (c) Upper panel: Archetypes for the set of reference antibiotics. The value calculation and heatmap scaling is the same as in panel a. Lower panel: distribution of feature importance contributing to archetypes. (d) Out-of-bag validation of the random forest classification model for reference compound set (upper panel) or similarity score of test compounds (lower panel). The similarity score in the individual experiment is the frequency of matching prediction expressed as a fraction of 1, where a higher value represents a higher similarity.
Figure 4
Figure 4
Identification of MoA for new antibacterial compound classes. Structures of the compound series 1. Indicated left of the structure is the quality of the fingerprint. Fingerprint quality: “Clear”: At least one parameter with >10 fold change in >85% of the individual datapoints at 4-fold LOED. “Weak”, no single parameter with consistent change required for classification as “clear”, but visual inspection reveals presence of a systematic weaker fingerprint including >5 parameters.
Figure 5
Figure 5
Bacteria phenotypic fingerprint can be powered by molecular information. (a) Heat map of differentially expressed genes across the whole transcriptome of E. coli. Columns represent genes and rows treatments with antibiotics at a concentration known to affect the cells. Z-score transformation was performed on mean log2 values (n = 3 replicates) for each gene, with blue denoting lower and red higher expression levels compared to the average. Hierarchical clustering of genes and samples is based on complete linkage and Pearson correlation distance. Color coded labels indicate compounds with similar MoA. (b) Correlation plot of mRNA expression levels from RNAseq with GFP intensity under the same gene promoter in E. coli strains quantified with HCS as the cell population median of the median pixel intensity per cell. (c) E. coli bacteria strains expressing recN and entC promoter GFP reporter gene constructs. Cells were treated with compound for 1h, and the individual images for membrane stain and GFP acquired with Opera QEHS instrument, 60x magnification. (d) Antibiotics treatment dose dependent intensity change in the GFP channel defined as fold standard deviation of non-treated samples for the median pixel intensity per cell, cell population median, N = 2, n = 2. (e) Heat map of differentially expressed reporter genes in E. coli. Columns represent genes and rows treatments with antibiotics at sublethal concentrations. Z-score transformation was performed on mean log2 values (N = 2, n = 1-2 replicates) for each gene, with blue denoting lower and red denoting higher expression levels compared to the average expression level. Hierarchical clustering of genes and samples is based on complete linkage and Pearson correlation distance. Color coded labels indicate compounds with similar MoA.
Figure 6
Figure 6
Adaptation to second species and application to novel antibiotics development (a,b) Compiled compound induced morphological changes at 8x LOED. For each compound, results from individual dose response curves from three experiments with LOED derived from the combined replicates within the experiment (n = 4) are shown. Each column corresponds to the data from one well and the rows individual features. The color indicates the change in the feature value, expressed as fold MAD over median calculated for all non-treated samples. The data are from a set of reference compounds (a), and a series of compounds with same chemical core-structure, (Tanimoto coefficients from 0.51 to 0.73) from an antibiotics development project with unknown MoA (b). (c) Similarity-based projection of the correlating random forest distance matrix into 3 dimensions showing clear separation between data points belonging to different compounds and close proximity of those belonging to the same compounds, forming clusters. For the Random forest analysis data from 4x and 8x LOED are pooled, and the number of datapoints are reduced by downsampling to be equal for all reference conditions. (d) Archetypes for the set of reference antibiotics (upper panel) with value calculation and heatmap scaling the same as in panel b, and distribution of feature importance contributing to archetypes (lower panel). (e) Out-of-bag validation of the random forest classification model for reference compound set (upper panel) and similarity score for different analogs from a new antibiotics series (lower panel).

References

    1. Ligon BL. Penicillin: its discovery and early development. Seminars in pediatric infectious diseases. 2004;15:52–57. doi: 10.1053/j.spid.2004.02.001. - DOI - PubMed
    1. Antibacterial agents in clinical development: an analysis of the antibacterial clinical development pipeline, including tuberculosis., (World Health Organisation, Geneva 2017).
    1. Simpkin, V. L., Renwick, M. J., Kelly, R. & Mossialos, E. Incentivising innovation in antibiotic drug discovery and development: progress, challenges and next steps. The Journal of antibiotics (2017). - PMC - PubMed
    1. Zheng W, Thorne N, McKew JC. Phenotypic screens as a renewed approach for drug discovery. Drug discovery today. 2013;18:1067–1073. doi: 10.1016/j.drudis.2013.07.001. - DOI - PMC - PubMed
    1. Reymond J-L. The Chemical Space Project. Accounts of Chemical Research. 2015;48:722–730. doi: 10.1021/ar500432k. - DOI - PubMed

MeSH terms

Substances