Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Apr;26(4):1040-1052.
doi: 10.1016/j.drudis.2020.11.037. Epub 2021 Jan 27.

Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data

Affiliations
Review

Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data

Andreas Bender et al. Drug Discov Today. 2021 Apr.

Abstract

'Artificial Intelligence' (AI) has recently had a profound impact on areas such as image and speech recognition, and this progress has already translated into practical applications. However, in the drug discovery field, such advances remains scarce, and one of the reasons is intrinsic to the data used. In this review, we discuss aspects of, and differences in, data from different domains, namely the image, speech, chemical, and biological domains, the amounts of data available, and how relevant they are to drug discovery. Improvements in the future are needed with respect to our understanding of biological systems, and the subsequent generation of practically relevant data in sufficient quantities, to truly advance the field of AI in drug discovery, to enable the discovery of novel chemistry, with novel modes of action, which shows desirable efficacy and safety in the clinic.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
Illustration of the differences between image recognition and classification tasks in the chemical and biological drug discovery domains. When classifying images (and also speech), the model architecture and representation of object are more integrated than when using chemical and biological data, and labels can be assigned relatively less ambiguously. In the chemical domain, the best representation of an object is generally unknown (different aspects of a chemical are responsible for different types of effect, and some might be related to the functional group, others related to surface properties, etc.), whereas, in the biological domain, it is not clear which type of information provides information related to which endpoint. Common to the chemical and biological domains is that labels depend to a large extent on the set-up of a particular experiment, even if the same thing is measured ‘in principle’.
Figure 2
Figure 2
The positive predictive values (PPV) of target–adverse event associations against the hit rate or recall (i.e., the fraction of drugs associated with the adverse event also being active at an individual protein target). Activity calls were made based on the ratio of the in vitro bioactivity and the unbound plasma concentration. Target–adverse event pairs with a high PPV tend to have a low hit rate, meaning only a small share of all drugs associated with the adverse event would be picked up by the bioactivity at the target. Alternatively, a high hit rate is associated with a low PPV, indicating a high false positive rate for that target–adverse event combination. Thus, overall, there exists no clear 1:1 relationship between on-target activity and observed adverse events after compound administration. Abbreviations: ADRA1B, α1b adrenergic receptor; ACE, angiotensin-converting enzyme; CHRM1/2/3, muscarinic acetylcholine receptor M1/2/3; PTGS1, cyclooxygenase-1; DRD2, dopamine D2 receptor; FAERS, US Food and Drug Administration Adverse Event Reporting System; HTR2A, serotonin 2a (5-HT2a) receptor; HTR2C, serotonin 2c (5-HT2c) receptor; KCNH2, hERG; SIDER, SIDe Effect Resource.
Figure 3
Figure 3
Illustration of a scientific question, or hypothesis, at the basis of data generation. The hypothesis leads to the generation of relevant data for a given question, which are represented in a signal-preserving manner; and which are then analyzed using a method that is able to handle the signal in the data. A method cannot save an unsuitable representation, which cannot remedy irrelevant data, for an ill thought-through question. This principle needs to be at the basis of data generation for making true use of ‘artificial intelligence’ in drug discovery.

Similar articles

Cited by

References

    1. Ciresan D.C. Deep big simple neural nets for handwritten digit recognition. Neural Comput. 2010;22:3207–3220. - PubMed
    1. Krizhevsky A. ImageNet classification with deep convolutional neural networks. In: Pereira F., editor. NIPS’12: Proceedings of the 25th International Conference on Neural Information Processing Systems. Curran Associates; 2012. pp. 1097–1105.
    1. Srivastava N. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958.
    1. Deng J. 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2009. ImageNet: a large-scale hierarchical image database; pp. 248–255.
    1. Hochreiter S. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer S.C., Kolen J.F., editors. A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press; 2001. pp. XXX–YYY.

Publication types

LinkOut - more resources