Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 4;12(1):53.
doi: 10.1186/s13321-020-00454-3.

DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach

Affiliations

DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach

Yash Khemchandani et al. J Cheminform. .

Abstract

We address the problem of generating novel molecules with desired interaction properties as a multi-objective optimization problem. Interaction binding models are learned from binding data using graph convolution networks (GCNs). Since the experimentally obtained property scores are recognised as having potentially gross errors, we adopted a robust loss for the model. Combinations of these terms, including drug likeness and synthetic accessibility, are then optimized using reinforcement learning based on a graph convolution policy approach. Some of the molecules generated, while legitimate chemically, can have excellent drug-likeness scores but appear unusual. We provide an example based on the binding potency of small molecules to dopamine transporters. We extend our method successfully to use a multi-objective reward function, in this case for generating novel molecules that bind with dopamine transporters but not with those for norepinephrine. Our method should be generally applicable to the generation in silico of molecules with desirable properties.

Keywords: Cheminformatics; Deep learning; Generative methods; QSAR; Reinforcement learning.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest to report.

Figures

Fig. 1
Fig. 1
Block diagram of our basic system. A molecule is generated by the Reinforcement Learning (RL) pathway using a Graph Convolutional Policy Networks. This molecule is then used as an input for the property prediction module which outputs the property score as predicted by the module. This score is then used as the reward feedback for the RL pathway and the cycle restarts
Fig. 2
Fig. 2
The property prediction pipeline for our method. The steps in green represent the feature extraction using Graph Convolution and the steps in orange represent regression of property scores. a The molecule is represented is a feature vector with features described as in Sect. “Molecular property prediction”. b The feature vector is passed through a linear layer to get Depth-0 message. c Through repeated graph convolution (message passing) followed by Linear Layer, we get Depth N-1 message. d Each atom’s final message is calculated by summing up the messages (also Graph Convolution) of the neighbouring atoms. e The resultant message is passed through a Linear Layer and the mean of all the atoms is taken to get the final embedding. f The property score is regressed from the graph embedding by a Feed Forward Neural Network. g The loss between predicted property and ground truth property is then backpropagated to change the weights
Fig. 3
Fig. 3
The reinforcement learning pathway for systemic generation of molecules (Redrawn from You et al. [34]). a The state is defined as the current graph Gt and the possible atom types C. b The GCPN conducts message passing to encode the state as node embeddings and estimates the policy function. c The action to be performed (at) is sampled from the policy function. The environment performs a chemical valency check on the intermediate state and returns (d) the next state Gt and (e) the associated reward (rt)
Fig. 4
Fig. 4
Predicted and experimental values for the test sets of the dopamine (a) and norepinephrine (b) transporters. Lines are lines of best fit (a) y = 0.44 + 0.79x, r2 = 0.79; b y = 0.49 + 0.74x, r2 = 0.68)
Fig. 5
Fig. 5
In silico generation by DeepGraphMolGen of novel molecules with predicted binding capacity to the dopamine transporter. Molecules were generated as described in the text. a Top 10 molecules as predicted by DeepGraphMolGen versus the closest molecule in the BindingdB dataset and the Tanimoto similarity thereto (encoded using the RDKit patterned fingerprint). b Distribution of Tanimoto similarities to a molecule in BindingdB dataset of the top 500 molecules
Fig. 6
Fig. 6
In silico generation by DeepGraphMolGen of novel molecules with predicted binding capacity to the dopamine transporter. Molecules were generated as described in the text. a Top 10 molecules as predicted by DeepGraphMolGen versus the closest molecule in the BindingdB dataset and the Tanimoto similarity thereto (encoded using the RDKit patterned fingerprint). b Distribution of Tanimoto similarities to a molecule in BindingdB dataset of the top 500 molecules
Fig. 7
Fig. 7
In silico generation by DeepGraphMolGen of novel molecules with predicted binding capacity to the dopamine transporter using a generative method in which the number of heavy atoms is constrained to be lower than 25. Molecules were generated as described in the text. a Top 10 molecules as predicted by DeepGraphMolGen versus the closest molecule in the BindingdB dataset and the TS thereto (encoded using the RDKit patterned fingerprint). b Distribution of Tanimoto similarities (RDKit patterned encoding) to a molecule in BindingdB dataset of the top 500 molecules
Fig. 8
Fig. 8
In silico generation by DeepGraphMolGen of novel molecules with predicted binding capacity to the dopamine transporter using a generative method in which the number of heavy atoms is constrained to be lower than 15. Molecules were generated as described in the text. a Top 10 molecules as predicted by DeepGraphMolGen versus the closest molecule in the BindingdB dataset and the TS thereto (encoded using the RDKit patterned fingerprint). b Distribution of Tanimoto similarities (RDKit patterned encoding) to the closest molecule in BindingdB dataset of the top 500 molecules
Fig. 9
Fig. 9
In silico generation by DeepGraphMolGen of novel molecules with predicted binding capacity to the dopamine transporter using a generative method in which the number of heavy atoms is constrained to be lower than 25. Molecules were generated as described in the text. a Top 10 molecules as predicted by DeepGraphMolGen versus the closest molecule in the BindingdB dataset and the TS thereto (encoded using the RDKit patterned fingerprint). b Distribution of Tanimoto similarities (RDKit patterned encoding) to the closest molecule in BindingdB dataset of the top 500 molecules. c Plot of those molecules with differential affinities for the dopamine and norepinephrine transporters

Similar articles

Cited by

References

    1. Yang X, Zhang J, Yoshizoe K, Terayama K, Tsuda K. ChemTS: an efficient python library for de novo molecular generation. Sci Technol Adv Mater. 2017;18(1):972–976. - PMC - PubMed
    1. Gómez-Bombarelli R, Aguilera-Iparraguirre J, Hirzel TD, Duvenaud D, Maclaurin D, Blood-Forsythe MA, Chae HS, Einzinger M, Ha DG, Wu T, et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat Mater. 2016;15(10):1120. - PubMed
    1. Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci. 2018;4(2):268–276. - PMC - PubMed
    1. Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using machine learning: generative models for matter engineering. Science. 2018;361(6400):360–365. - PubMed
    1. Kadurin A, Nikolenko S, Khrabrov K, Aliper A, Zhavoronkov A. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol Pharm. 2017;14(9):3098–3104. - PubMed

LinkOut - more resources