Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 18;13(1):24.
doi: 10.1186/s13321-021-00501-7.

MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES

Affiliations

MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES

Yongbeom Kwon et al. J Cheminform. .

Abstract

Here, we introduce a new molecule optimization method, MolFinder, based on an efficient global optimization algorithm, the conformational space annealing algorithm, and the SMILES representation. MolFinder finds diverse molecules with desired properties efficiently without any training and a large molecular database. Compared with recently proposed reinforcement-learning-based molecule optimization algorithms, MolFinder consistently outperforms in terms of both the optimization of a given target property and the generation of a set of diverse and novel molecules. The efficiency of MolFinder demonstrates that combinatorial optimization using the SMILES representation is a promising approach for molecule optimization, which has not been well investigated despite its simplicity. We believe that our results shed light on new possibilities for advances in molecule optimization methods.

Keywords: Chemical space; Evolutionary algorithm; Molecular optimization; SMILES.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The workflow of MolFinder
Fig. 2
Fig. 2
The Crossover and mutation operators The crossover (a) and mutation operations (b) using SMILES strings
Fig. 3
Fig. 3
A comparison of modified drug-likeness scores of the generated molecules The violin plots of the modified drug-likeness scores of generated molecules by MolFinder, MolFinder-local, ReLeaSE, and MolDQN (top). The histogram of QED (left bottom) and SA score (right bottom) values of the generated molecules by MolFinder (orange), ReLeaSE (green), and MolDQN (red), and those of the initial ZINC15 database (blue)
Fig. 4
Fig. 4
Top-12 molecules discovered by MolFinder The modified drug-likeness scores (TARGET, Eq. 1) and their drug-likeness (QED) and synthetic accessibility score (SA score) are presented
Fig. 5
Fig. 5
Pairwise similarities between generated molecules. The density plots of pairwise similarities between generated molecules by MolFinder (blue), MolFinder-local (yellow), ReLeaSE (green), and MolDQN (red). The pairwise similarity was calculated using the RDKit fingerprint and Tanimoto coefficient
Fig. 6
Fig. 6
An overview of the distribution of generated molecules on chemical space The t-SNE plot of the top-1000 molecules generated by MolFinder (yellow), MolFinder-local (green), MolDQN (red), and ReLeaSE (purple). For comparison, initial/seed molecules from ZINC15 (blue) are illustrated together. The sizes of circles are proportional to the molecules’ SmQED values. The best molecule generated by each method is emphasized with black border lines
Fig. 7
Fig. 7
Assessment of generating similar molecules to a reference Histograms of a objective values (Eq. 3), b similarities to the reference molecules, and c drug-likeness scores (QED) of molecules generated by MolFinder (orange) and MolDQN (blue)

Similar articles

Cited by

References

    1. Kuhn C, Beratan DN. Inverse strategies for molecular design. J Phys Chem. 1996;100(25):10595–10599. doi: 10.1021/jp960518i. - DOI
    1. Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using machine learning: generative models for matter engineering. Science. 2018;361(6400):360–365. doi: 10.1126/science.aat2663. - DOI - PubMed
    1. Schneider P, Walters WP, Plowright AT, Sieroka N, Listgarten J, Goodnow RA, Fisher J, Jansen JM, Duca JS, Rush TS, Zentgraf M, Hill JE, Krutoholow E, Kohler M, Blaney J, Funatsu K, Luebkemann C, Schneider G. Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov. 2020;19(5):353–364. doi: 10.1038/s41573-019-0050-3. - DOI - PubMed
    1. Elton DC, Boukouvalas Z, Fuge MD, Chung PW. Deep learning for molecular design—a review of the state of the art. Mol Syst Des Eng. 2019;4(4):828–849. doi: 10.1039/c9me00039a. - DOI
    1. Weininger D. SMILES, a chemical language and information system: 1: introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31–36. doi: 10.1021/ci00057a005. - DOI

LinkOut - more resources