Machine learning in chemoinformatics and drug discovery

Yu-Chen Lo¹, Stefano E Rensi¹, Wen Torng¹, Russ B Altman²

Affiliations

¹ Department of Bioengineering, Stanford University, Stanford, CA, USA.
² Department of Bioengineering, Stanford University, Stanford, CA, USA. Electronic address: rbaltman@stanford.edu.

PMID: 29750902
PMCID: PMC6078794
DOI: 10.1016/j.drudis.2018.05.010

Review

Machine learning in chemoinformatics and drug discovery

Yu-Chen Lo et al. Drug Discov Today. 2018 Aug.

. 2018 Aug;23(8):1538-1546.

doi: 10.1016/j.drudis.2018.05.010. Epub 2018 May 8.

Authors

Yu-Chen Lo¹, Stefano E Rensi¹, Wen Torng¹, Russ B Altman²

Affiliations

¹ Department of Bioengineering, Stanford University, Stanford, CA, USA.
² Department of Bioengineering, Stanford University, Stanford, CA, USA. Electronic address: rbaltman@stanford.edu.

PMID: 29750902
PMCID: PMC6078794
DOI: 10.1016/j.drudis.2018.05.010

Abstract

Chemoinformatics is an established discipline focusing on extracting, processing and extrapolating meaningful data from chemical structures. With the rapid explosion of chemical 'big' data from HTS and combinatorial synthesis, machine learning has become an indispensable tool for drug designers to mine chemical information from large compound databases to design drugs with important biological properties. To process the chemical data, we first reviewed multiple processing layers in the chemoinformatics pipeline followed by the introduction of commonly used machine learning models in drug discovery and QSAR analysis. Here, we present basic principles and recent case studies to demonstrate the utility of machine learning techniques in chemoinformatics analyses; and we discuss limitations and future directions to guide further development in this evolving field.

PubMed Disclaimer

Figures

**Figure 1**
Computational workflow for chemoinformatics analysis using machine learning. The first step of chemoinformatics analysis is feature extraction, through which the compound is characterized by substructure fragments or other chemical descriptors (first box). The chemical features of the compound are represented by chemical fingerprints and applied for compound similarity comparison based on the presence and absence of shared chemical features. The chemical fingerprint can be used for predicting other chemical and physiochemical properties in QSAR/QSPR analysis using diverse machine learning models including making inference from the training data by comparison (instance-based learning) or from the trained statistical model (model-based learning) (second box).

See this image and copyright information in PMC

References

1. Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model. 2012;52:1413–1437. - PubMed
1. Ali SM, et al. Butitaxel analogues: synthesis and structure-activity relationships. J Med Chem. 1997;40:236–241. - PubMed
1. Cherkasov A, et al. QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57:4977–5010. - PMC - PubMed
1. Kubinyi H. Free Wilson analysis. Theory, applications and its relationship to Hansch analysis. Quantitative Structure–Activity Relationships. 1988;7:121–133.
1. Gasteiger J, editor. Handbook of Chemoinformatics: from Data to Knowledge. Wiley-VCH; 2003.

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning in chemoinformatics and drug discovery

Affiliations

Machine learning in chemoinformatics and drug discovery

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical