Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 26;14(1):40.
doi: 10.1186/s13321-022-00623-6.

Designing optimized drug candidates with Generative Adversarial Network

Affiliations

Designing optimized drug candidates with Generative Adversarial Network

Maryam Abbasi et al. J Cheminform. .

Erratum in

Abstract

Drug design is an important area of study for pharmaceutical businesses. However, low efficacy, off-target delivery, time consumption, and high cost are challenges and can create barriers that impact this process. Deep Learning models are emerging as a promising solution to perform de novo drug design, i.e., to generate drug-like molecules tailored to specific needs. However, stereochemistry was not explicitly considered in the generated molecules, which is inevitable in targeted-oriented molecules. This paper proposes a framework based on Feedback Generative Adversarial Network (GAN) that includes optimization strategy by incorporating Encoder-Decoder, GAN, and Predictor deep models interconnected with a feedback loop. The Encoder-Decoder converts the string notations of molecules into latent space vectors, effectively creating a new type of molecular representation. At the same time, the GAN can learn and replicate the training data distribution and, therefore, generate new compounds. The feedback loop is designed to incorporate and evaluate the generated molecules according to the multiobjective desired property at every epoch of training to ensure a steady shift of the generated distribution towards the space of the targeted properties. Moreover, to develop a more precise set of molecules, we also incorporate a multiobjective optimization selection technique based on a non-dominated sorting genetic algorithm. The results demonstrate that the proposed framework can generate realistic, novel molecules that span the chemical space. The proposed Encoder-Decoder model correctly reconstructs 99% of the datasets, including stereochemical information. The model's ability to find uncharted regions of the chemical space was successfully shown by optimizing the unbiased GAN to generate molecules with a high binding affinity to the Kappa Opioid and Adenosine [Formula: see text] receptor. Furthermore, the generated compounds exhibit high internal and external diversity levels 0.88 and 0.94, respectively, and uniqueness.

Keywords: Drug design; GAN; Generative Adversial Network; Multiobjective optimization; NSGA; QSAR; RNN; SMILES.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The general workflow. This model is composed of an Encoder–Decoder (A and B) that converts SMILES into latent space vectors that are then used as real data in the training of a WGAN-GP network that comprises a Generator (D) and Critic (E). The feedback-loop, Predictor (F), and selecting Pareto optimal molecules by NSGA-II algorithm (G) are only active during the optimization step
Fig. 2
Fig. 2
Data preprocessing of the SMILES string. A Acetylsalicylic Acid using One-hot Encoding and B a sample Adenosine Receptor BDBM21220 through embedding method
Fig. 3
Fig. 3
The detailed structure of the Encoder (A) and Decoder (B). This model is used to convert the SMILES strings into vectors in the latent space [context vector (C)]
Fig. 4
Fig. 4
General schema of LSTM-based Predictor architecture. This regression model aims to predict the affinity of a given molecule in the format of SMILES string
Fig. 5
Fig. 5
Comparison of the predicted pIC 50 distributions for the original data and generated data by WGAN-GP model
Fig. 6
Fig. 6
Evaluation of WGAN-GP model for the original training data and generated. A Evaluation of the QED and SAS. B Evaluation of the logP and MW
Fig. 7
Fig. 7
Scatter plots from applying the Predictor for the binding affinity. The plot shows the predicted pIC50 with the model versus true pIC50 and the regression line for the test set to different datasets
Fig. 8
Fig. 8
Distribution of generated molecules and the predicted pIC 50. The plot shows the distribution of the predicted pIC50 values for the unbiased model and the biased model (feedbackGAN) at every 100 epochs from the KOR dataset
Fig. 9
Fig. 9
Evaluation of the logP versos MW (left) and the QED versos SAS (right) for the biased model (feedbackGAN) at 500 epochs
Fig. 10
Fig. 10
Distribution of the predicted pIC50 values for different sampling methods from the ADORA2A dataset
Fig. 11
Fig. 11
Determination of the set of selected molecules. Pareto diagram containing the approximated Pareto front in 4 layers, with the non-dominated scores of PIC50(m),-SAS(m) in red
Fig. 12
Fig. 12
Distribution of the predicted pIC50 values for different sampling methods from the ADORA2A dataset

References

    1. DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: new estimates of r&d costs. J Health Econ. 2016;47:20–33. doi: 10.1016/j.jhealeco.2016.01.012. - DOI - PubMed
    1. Rashid MBMA. Artificial intelligence effecting a paradigm shift in drug development. SLAS Technol. 2021;26(1):3–15. doi: 10.1177/2472630320956931. - DOI - PubMed
    1. Xue H, Li J, Xie H, Wang Y. Review of drug repositioning approaches and resources. Int J Biol Sci. 2018;14(10):1232. doi: 10.7150/ijbs.24612. - DOI - PMC - PubMed
    1. Polishchuk PG, Madzhidov TI, Varnek A. Estimation of the size of drug-like chemical space based on gdb-17 data. J Comput Aided Mol Des. 2013;27(8):675–679. doi: 10.1007/s10822-013-9672-4. - DOI - PubMed
    1. Rodrigues T, Reker D, Schneider P, Schneider G. Counting on natural products for drug design. Nat Chem. 2016;8(6):531. doi: 10.1038/nchem.2479. - DOI - PubMed

LinkOut - more resources