Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data

Andreas Bender¹, Isidro Cortes-Ciriano²

Affiliations

¹ Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK; Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK. Electronic address: ab454@cam.ac.uk.
² European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK. Electronic address: icortes@ebi.ac.uk.

PMID: 33508423
PMCID: PMC8132984
DOI: 10.1016/j.drudis.2020.11.037

Review

Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data

Andreas Bender et al. Drug Discov Today. 2021 Apr.

. 2021 Apr;26(4):1040-1052.

doi: 10.1016/j.drudis.2020.11.037. Epub 2021 Jan 27.

Authors

Andreas Bender¹, Isidro Cortes-Ciriano²

Affiliations

¹ Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK; Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK. Electronic address: ab454@cam.ac.uk.
² European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK. Electronic address: icortes@ebi.ac.uk.

PMID: 33508423
PMCID: PMC8132984
DOI: 10.1016/j.drudis.2020.11.037

Abstract

'Artificial Intelligence' (AI) has recently had a profound impact on areas such as image and speech recognition, and this progress has already translated into practical applications. However, in the drug discovery field, such advances remains scarce, and one of the reasons is intrinsic to the data used. In this review, we discuss aspects of, and differences in, data from different domains, namely the image, speech, chemical, and biological domains, the amounts of data available, and how relevant they are to drug discovery. Improvements in the future are needed with respect to our understanding of biological systems, and the subsequent generation of practically relevant data in sufficient quantities, to truly advance the field of AI in drug discovery, to enable the discovery of novel chemistry, with novel modes of action, which shows desirable efficacy and safety in the clinic.

PubMed Disclaimer

Figures

**Figure 1**
Illustration of the differences between image recognition and classification tasks in the chemical and biological drug discovery domains. When classifying images (and also speech), the model architecture and representation of object are more integrated than when using chemical and biological data, and labels can be assigned relatively less ambiguously. In the chemical domain, the best representation of an object is generally unknown (different aspects of a chemical are responsible for different types of effect, and some might be related to the functional group, others related to surface properties, etc.), whereas, in the biological domain, it is not clear which type of information provides information related to which endpoint. Common to the chemical and biological domains is that labels depend to a large extent on the set-up of a particular experiment, even if the same thing is measured ‘in principle’.

**Figure 2**
The positive predictive values (PPV) of target–adverse event associations against the hit rate or recall (i.e., the fraction of drugs associated with the adverse event also being active at an individual protein target). Activity calls were made based on the ratio of the in vitro bioactivity and the unbound plasma concentration. Target–adverse event pairs with a high PPV tend to have a low hit rate, meaning only a small share of all drugs associated with the adverse event would be picked up by the bioactivity at the target. Alternatively, a high hit rate is associated with a low PPV, indicating a high false positive rate for that target–adverse event combination. Thus, overall, there exists no clear 1:1 relationship between on-target activity and observed adverse events after compound administration. Abbreviations: ADRA1B, α1b adrenergic receptor; ACE, angiotensin-converting enzyme; CHRM1/2/3, muscarinic acetylcholine receptor M1/2/3; PTGS1, cyclooxygenase-1; DRD2, dopamine D2 receptor; FAERS, US Food and Drug Administration Adverse Event Reporting System; HTR2A, serotonin 2a (5-HT2a) receptor; HTR2C, serotonin 2c (5-HT2c) receptor; KCNH2, hERG; SIDER, SIDe Effect Resource.

**Figure 3**
Illustration of a scientific question, or hypothesis, at the basis of data generation. The hypothesis leads to the generation of relevant data for a given question, which are represented in a signal-preserving manner; and which are then analyzed using a method that is able to handle the signal in the data. A method cannot save an unsuitable representation, which cannot remedy irrelevant data, for an ill thought-through question. This principle needs to be at the basis of data generation for making true use of ‘artificial intelligence’ in drug discovery.

See this image and copyright information in PMC

References

1. Ciresan D.C. Deep big simple neural nets for handwritten digit recognition. Neural Comput. 2010;22:3207–3220. - PubMed
1. Krizhevsky A. ImageNet classification with deep convolutional neural networks. In: Pereira F., editor. NIPS’12: Proceedings of the 25th International Conference on Neural Information Processing Systems. Curran Associates; 2012. pp. 1097–1105.
1. Srivastava N. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958.
1. Deng J. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2009. ImageNet: a large-scale hierarchical image database; pp. 248–255.
1. Hochreiter S. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer S.C., Kolen J.F., editors. A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press; 2001. pp. XXX–YYY.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data

Affiliations

Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources