Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 31;24(7):6573.
doi: 10.3390/ijms24076573.

A Systematic Review of Deep Learning Methodologies Used in the Drug Discovery Process with Emphasis on In Vivo Validation

Affiliations

A Systematic Review of Deep Learning Methodologies Used in the Drug Discovery Process with Emphasis on In Vivo Validation

Nikoletta-Maria Koutroumpa et al. Int J Mol Sci. .

Abstract

The discovery and development of new drugs are extremely long and costly processes. Recent progress in artificial intelligence has made a positive impact on the drug development pipeline. Numerous challenges have been addressed with the growing exploitation of drug-related data and the advancement of deep learning technology. Several model frameworks have been proposed to enhance the performance of deep learning algorithms in molecular design. However, only a few have had an immediate impact on drug development since computational results may not be confirmed experimentally. This systematic review aims to summarize the different deep learning architectures used in the drug discovery process and are validated with further in vivo experiments. For each presented study, the proposed molecule or peptide that has been generated or identified by the deep learning model has been biologically evaluated in animal models. These state-of-the-art studies highlight that even if artificial intelligence in drug discovery is still in its infancy, it has great potential to accelerate the drug discovery cycle, reduce the required costs, and contribute to the integration of the 3R (Replacement, Reduction, Refinement) principles. Out of all the reviewed scientific articles, seven algorithms were identified: recurrent neural networks, specifically, long short-term memory (LSTM-RNNs), Autoencoders (AEs) and their Wasserstein Autoencoders (WAEs) and Variational Autoencoders (VAEs) variants; Convolutional Neural Networks (CNNs); Direct Message Passing Neural Networks (D-MPNNs); and Multitask Deep Neural Networks (MTDNNs). LSTM-RNNs were the most used architectures with molecules or peptide sequences as inputs.

Keywords: animal model; artificial intelligence; biological evaluation; deep learning; drug design; drug discovery; in vivo; machine learning.

PubMed Disclaimer

Conflict of interest statement

N.-M.K., K.D.P., A.G.P. and A.A. are employed by NovaMechanics Ltd., a cheminformatics company.

Figures

Figure 1
Figure 1
The workflow followed by most studies presented in this review. It contains molecules, molecular encoding, a deep architecture model, virtual screening, and/or molecular docking to reduce the number of candidate compounds to a final set of compounds. These are synthesized and tested for their activity in vitro and in vivo.
Figure 2
Figure 2
A summary of the papers considered in each stage of the review process. Studies combining early-stage drug discovery and preclinical studies are very limited, resulting in 12 studies to be included in the review.
Figure 3
Figure 3
An autoencoder consists of an encoder functionality, which translates an input into a latent space, and a decoder, which translates the internal latent space back to the original input space. The goal of the autoencoder is to compute a reconstruction x’ with minimal error compared to the original input x.
Figure 4
Figure 4
Generative Adversarial Network (GAN): Two independent competing networks are trained simultaneously: the Generator (G), which takes an input z from probability distribution p(z) and generates data G(z); and the Discriminator (D), which receives as input the training data or the output from the generator G(z) and tries to predict whether the input is real or generated.
Figure 5
Figure 5
Architecture of recurrent neural networks. The inputs are represented by xt. For the standard RNN, the hidden state at time step t is represented as st.; is the “memory” of the network, and for time step t, st is calculated based on the previous hidden state and the input at the current step: st = f(Uxt + Ws(t − 1)). The function f is usually a nonlinearity, such as tanh or Rectified Linear Unit (ReLU).
Figure 6
Figure 6
Schematic diagram of a CNN. A convolutional layer followed by a pooling layer forms a convolutional module. Each module learns to identify features while preserving spatial relationships. A fully connected layer is followed, which utilizes the output from the convolution process and predicts the class in a classification problem, based on the features extracted in previous stages.
Figure 7
Figure 7
The relative frequencies per year of the deep learning models described in the present review.

Similar articles

Cited by

References

    1. Patel L., Shukla T., Huang X., Ussery D.W., Wang S. Machine Learning Methods in Drug Discovery. Molecules. 2020;25:5277. doi: 10.3390/molecules25225277. - DOI - PMC - PubMed
    1. Torjesen I. Drug Development: The Journey of a Medicine from Lab to Shelf. [(accessed on 12 March 2022)];Pharm. J. 2015 Available online: https://pharmaceutical-journal.com/article/feature/drug-development-the-....
    1. Scannell J.W., Blanckley A., Boldon H., Warrington B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discov. 2012;11:191–200. doi: 10.1038/nrd3681. - DOI - PubMed
    1. Hughes J.P., Rees S., Kalindjian S.B., Philpott K.L. Principles of early drug discovery. Br. J. Pharmacol. 2011;162:1239–1249. doi: 10.1111/j.1476-5381.2010.01127.x. - DOI - PMC - PubMed
    1. Polishchuk P.G., Madzhidov T.I., Varnek A. Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput. Mol. Des. 2013;27:675–679. doi: 10.1007/s10822-013-9672-4. - DOI - PubMed

Publication types

LinkOut - more resources