. 2013 Nov 25;53(11):3054-63.

doi: 10.1021/ci400480s. Epub 2013 Oct 30.

Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation

Sean Ekins¹, Joel S Freundlich, Robert C Reynolds

Affiliations

PMID: 24144044
PMCID: PMC3910492
DOI: 10.1021/ci400480s

Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation

Sean Ekins et al. J Chem Inf Model. 2013.

. 2013 Nov 25;53(11):3054-63.

doi: 10.1021/ci400480s. Epub 2013 Oct 30.

Authors

Sean Ekins¹, Joel S Freundlich, Robert C Reynolds

Affiliation

¹ Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.

PMID: 24144044
PMCID: PMC3910492
DOI: 10.1021/ci400480s

Abstract

The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

SE is a consultant for Collaborative Drug Discovery, Inc.

Figures

**Figure 1**
A. Principal Component Analysis of all *Mtb* datasets (7728 active and inactive compounds) used in this study and overlap of 177 GSK published leads. 3 principal components explain 73% of the variance. B inset to show some of the GSK leads (yellow) widely dispersed and within the chemistry space of the *Mtb* datasets used for modeling.

**Figure 2**
Clustering and PCA of TB Mobile data. A. Examination of 745 TB Mobile molecules with interpretable descriptors results in a PCA with 3 PCs, which explain 88% variability. Outlier compounds represent macrocycles (bottom right) and long lipid-like molecules (bottom left). B. 1429 SRI hits from four datasets (active and non-toxic only, from the SRI screens where: IC₉₀ < 10 µg/ml or 10 µM and a selectivity index (SI) greater than ten where the SI is calculated from SI = CC₅₀/IC₉₀) and 745 TB Mobile compounds results in a PCA with 3 PCs explaining 83% variability; SRI compounds are clustered (yellow). C. Examination of 177 GSK leads (yellow) and the TB Mobile compounds results in a PCA with 3 PCs, which explain 88 % of variance.

See this image and copyright information in PMC

Cited by

Intrabacterial Metabolism Obscures the Successful Prediction of an InhA Inhibitor of Mycobacterium tuberculosis.
Wang X, Perryman AL, Li SG, Paget SD, Stratton TP, Lemenze A, Olson AJ, Ekins S, Kumar P, Freundlich JS. Wang X, et al. ACS Infect Dis. 2019 Dec 13;5(12):2148-2163. doi: 10.1021/acsinfecdis.9b00295. Epub 2019 Nov 5. ACS Infect Dis. 2019. PMID: 31625383 Free PMC article.
High-Throughput Phenotypic Screening and Machine Learning Methods Enabled the Selection of Broad-Spectrum Low-Toxicity Antitrypanosomatidic Agents.
Linciano P, Quotadamo A, Luciani R, Santucci M, Zorn KM, Foil DH, Lane TR, Cordeiro da Silva A, Santarem N, B Moraes C, Freitas-Junior L, Wittig U, Mueller W, Tonelli M, Ferrari S, Venturelli A, Gul S, Kuzikov M, Ellinger B, Reinshagen J, Ekins S, Costi MP. Linciano P, et al. J Med Chem. 2023 Nov 23;66(22):15230-15255. doi: 10.1021/acs.jmedchem.3c01322. Epub 2023 Nov 3. J Med Chem. 2023. PMID: 37921561 Free PMC article.
Molecule Property Analyses of Active Compounds for Mycobacterium tuberculosis.
Makarov V, Salina E, Reynolds RC, Kyaw Zin PP, Ekins S. Makarov V, et al. J Med Chem. 2020 Sep 10;63(17):8917-8955. doi: 10.1021/acs.jmedchem.9b02075. Epub 2020 Apr 20. J Med Chem. 2020. PMID: 32259446 Free PMC article. Review.
Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015).
Ekins S, Perryman AL, Clark AM, Reynolds RC, Freundlich JS. Ekins S, et al. J Chem Inf Model. 2016 Jul 25;56(7):1332-43. doi: 10.1021/acs.jcim.6b00004. Epub 2016 Jul 1. J Chem Inf Model. 2016. PMID: 27335215 Free PMC article.
Machine Learning Models for Mycobacterium tuberculosisIn Vitro Activity: Prediction and Target Visualization.
Lane TR, Urbina F, Rank L, Gerlach J, Riabova O, Lepioshkin A, Kazakova E, Vocat A, Tkachenko V, Cole S, Makarov V, Ekins S. Lane TR, et al. Mol Pharm. 2022 Feb 7;19(2):674-689. doi: 10.1021/acs.molpharmaceut.1c00791. Epub 2021 Dec 29. Mol Pharm. 2022. PMID: 34964633 Free PMC article.

See all "Cited by" articles

References

1. Balganesh TS, Alzari PM, Cole ST. Rising standards for tuberculosis drug development. Trends Pharmacol Sci. 2008;29:576–581. - PubMed
1. Zhang Y. The magic bullets and tuberculosis drug targets. Annu Rev Pharmacol Toxicol. 2005;45:529–564. - PubMed
1. Ballel L, Field RA, Duncan K, Young RJ. New small-molecule synthetic antimycobacterials. Antimicrob Agents Chemother. 2005;49:2153–2163. - PMC - PubMed
1. Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, 3rd., Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, Krogh A, McLean J, Moule S, Murphy L, Oliver K, Osborne J, Quail MA, Rajandream MA, Rogers J, Rutter S, Seeger K, Skelton J, Squares R, Squares S, Sulston JE, Taylor K, Whitehead S, Barrell BG. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 1998;393(6685):537–544. - PubMed
1. Koul A, Arnoult E, Lounis N, Guillemont J, Andries K. The challenge of new drug discovery for tuberculosis. Nature. 2011;469(7331):483–490. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation

Affiliation

Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources