Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 11;63(23):7392-7400.
doi: 10.1021/acs.jcim.3c01220. Epub 2023 Nov 22.

VGAE-MCTS: A New Molecular Generative Model Combining the Variational Graph Auto-Encoder and Monte Carlo Tree Search

Affiliations

VGAE-MCTS: A New Molecular Generative Model Combining the Variational Graph Auto-Encoder and Monte Carlo Tree Search

Hiroaki Iwata et al. J Chem Inf Model. .

Abstract

Molecular generation is crucial for advancing drug discovery, materials science, and chemical exploration. It expedites the search for new drug candidates, facilitates tailored material creation, and enhances our understanding of molecular diversity. By employing artificial intelligence techniques such as molecular generative models based on molecular graphs, researchers have tackled the challenge of identifying efficient molecules with desired properties. Here, we propose a new molecular generative model combining a graph-based deep neural network and a reinforcement learning technique. We evaluated the validity, novelty, and optimized physicochemical properties of the generated molecules. Importantly, the model explored uncharted regions of chemical space, allowing for the efficient discovery and design of new molecules. This innovative approach has considerable potential to revolutionize drug discovery, materials science, and chemical research for accelerating scientific innovation. By leveraging advanced techniques and exploring previously unexplored chemical spaces, this study offers promising prospects for the efficient discovery and design of new molecules in the field of drug development.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Workflow of our proposed model (VGAE-MCTS). VGAE-MCTS consists of three parts: (i) Converting the molecules of the training data into feature maps (preparation of input data), (ii) Training the distribution of molecules in the training data using the VGAE (training of Variational Graph Auto-Encoder, VGAE), and (iii) Generating molecules by connecting atoms and bonds one by one based on the feature map output from the learned VGAE decoder using MCTS (molecular generation using Monte Carlo Tree Search, MCTS).
Figure 2
Figure 2
Results of QED-optimized generated molecules. (A) The vertical axis shows the QED value from 0 to 1, and the horizontal axis shows the molecules of the ZINC data set, previous models, and VGAE-MCTS. White dots represent the mean values, and the bulge represents the density. (B) Mean (standard deviation) and median QED values for each molecular group are shown. (C) Chemical structures of the top five molecules generated by VGAE-MCTS and their QED values are displayed.
Figure 3
Figure 3
Results of penalized log P-optimized generated molecules. (A) The vertical axis shows the scaled penalized log P value from 0 to 1, and the horizontal axis shows the molecules of the ZINC data set, previous models, and VGAE-MCTS. The white dots represent the mean scaled penalized log P values, and the bulge represents the density. (B) The mean (standard deviation) and median scaled penalized log P value for each molecular group are shown. (C) Chemical structures of the top five molecules generated by VGAE-MCTS and their scaled penalized log P and penalized log P values are displayed.
Figure 4
Figure 4
Visualization of the chemical spaces of QED-optimized generated molecules. Molecules generated by optimizing QED are plotted in two dimensions using ECFP. Molecules from the ZINC training data are shown in blue. The molecules generated by JT-VAE are shown in orange, MolDQN in green, and VGAE-MCTS in red. (A) Distribution of molecules in ZINC training data and molecules generated by the three models. (B) Distribution of molecules in ZINC training data and molecules generated by VGAE-MCTS. (C) Distribution of molecules in ZINC training data and molecules generated by JT-VAE. (D) Distribution of molecules in ZINC training data and molecules generated by MolDQN.

Similar articles

Cited by

References

    1. DiMasi J. A.; Grabowski H. G.; Hansen R. W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 2016, 47, 20–33. 10.1016/j.jhealeco.2016.01.012. - DOI - PubMed
    1. Vijayan R. S. K.; Kihlberg J.; Cross J. B.; Poongavanam V. Enhancing preclinical drug discovery with artificial intelligence. Drug discovery today 2022, 27 (4), 967–984. 10.1016/j.drudis.2021.11.023. - DOI - PubMed
    1. Butler K. T.; Davies D. W.; Cartwright H.; Isayev O.; Walsh A. Machine learning for molecular and materials science. Nature 2018, 559 (7715), 547–555. 10.1038/s41586-018-0337-2. - DOI - PubMed
    1. Dobson C. M. Chemical space and biology. Nature 2004, 432 (7019), 824–828. 10.1038/nature03192. - DOI - PubMed
    1. Sanchez-Lengeling B.; Aspuru-Guzik A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 2018, 361 (6400), 360–365. 10.1126/science.aat2663. - DOI - PubMed

Publication types