Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 24;18(1):972-976.
doi: 10.1080/14686996.2017.1401424. eCollection 2017.

ChemTS: an efficient python library for de novo molecular generation

Affiliations

ChemTS: an efficient python library for de novo molecular generation

Xiufeng Yang et al. Sci Technol Adv Mater. .

Abstract

Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational autoencoders and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS.

Keywords: 404 Materials informatics / Genomics; 60 New topics/Others; Molecular design; Monte Carlo tree search; python library; recurrent neural network.

PubMed Disclaimer

Conflict of interest statement

No potential conflict of interest was reported by the authors.

Figures

None
Graphical abstract
Figure 1.
Figure 1.
Monte Carlo tree search. (a) Selection step: the search tree is traversed from the root to a leaf by choosing the child with the largest UCB score. (b) Expansion step: 30 children nodes are created by sampling from RNN. (c) Simulation step: paths to terminal nodes are created by the rollout procedure using RNN. Rewards of the corresponding molecules are computed. (d) Backpropagation step: the internal parameters of upstream nodes are updated.
Figure 2.
Figure 2.
Best 20 molecules by ChemTS. Blue parts in SMILES strings indicate prefixes made in the search tree. The remaining parts are made by the rollout procedure.

References

    1. Niu G, Guo X, Wang L. Review of recent progress in chemical stability of perovskite solar cells. J Mater Chem A. 2015;3(17):8970–8980.
    1. Kaji H, Suzuki H, Fukushima T, et al. . Purely organic electroluminescent material realizing 100% conversion from electricity to light. Nat Commun. 2015;6:8476. - PMC - PubMed
    1. Ueda A, Yamada S, Isono T, et al. . Hydrogen-bond-dynamics-based switching of conductivity and magnetism: a phase transition caused by deuterium and electron transfer in a hydrogen-bonded purely organic conductor crystal. J Am Chem Soc. 2014;136(34):12184–12192. - PubMed
    1. Yeung MCL, Yam VWW. Luminescent cation sensors: from host-guest chemistry, supramolecular chemistry to reaction-based mechanisms. Chem Soc Rev. 2015;44(13):4192–4202. - PubMed
    1. Horiuchi S, Tokura Y. Organic ferroelectrics. Nat Mater. 2008;7(5):357–366. - PubMed