Graph-based molecular Pareto optimisation

Jonas Verhellen¹

Affiliations

PMID: 35872811
PMCID: PMC9241971
DOI: 10.1039/d2sc00821a

Graph-based molecular Pareto optimisation

Jonas Verhellen. Chem Sci. 2022.

. 2022 Jun 2;13(25):7526-7535.

doi: 10.1039/d2sc00821a. eCollection 2022 Jun 29.

Author

Jonas Verhellen¹

Affiliation

¹ Centre for Integrative Neuroplasticity, University of Oslo N-0316 Oslo Norway jverhell@gmail.com.

PMID: 35872811
PMCID: PMC9241971
DOI: 10.1039/d2sc00821a

Abstract

Computer-assisted design of small molecules has experienced a resurgence in academic and industrial interest due to the widespread use of data-driven techniques such as deep generative models. While the ability to generate molecules that fulfil required chemical properties is encouraging, the use of deep learning models requires significant, if not prohibitive, amounts of data and computational power. At the same time, open-sourcing of more traditional techniques such as graph-based genetic algorithms for molecular optimisation [Jensen, Chem. Sci., 2019, 12, 3567-3572] has shown that simple and training-free algorithms can be efficient and robust alternatives. Further research alleviated the common genetic algorithm issue of evolutionary stagnation by enforcing molecular diversity during optimisation [Van den Abeele, Chem. Sci., 2020, 42, 11485-11491]. The crucial lesson distilled from the simultaneous development of deep generative models and advanced genetic algorithms has been the importance of chemical space exploration [Aspuru-Guzik, Chem. Sci., 2021, 12, 7079-7090]. For single-objective optimisation problems, chemical space exploration had to be discovered as a useable resource but in multi-objective optimisation problems, an exploration of trade-offs between conflicting objectives is inherently present. In this paper we provide state-of-the-art and open-source implementations of two generations of graph-based non-dominated sorting genetic algorithms (NSGA-II, NSGA-III) for molecular multi-objective optimisation. We provide the results of a series of benchmarks for the inverse design of small molecule drugs for both the NSGA-II and NSGA-III algorithms. In addition, we introduce the dominated hypervolume and extended fingerprint based internal similarity as novel metrics for these benchmarks. By design, NSGA-II, and NSGA-III outperform a single optimisation method baseline in terms of dominated hypervolume, but remarkably our results show they do so without relying on a greater internal chemical diversity.

This journal is © The Royal Society of Chemistry.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1. Visualistion of a Pareto front (dark blue) and dominated solutions (light blue). Example molecules shown at the Pareto front were generated by NSGA-II for Tanimoto similarities with regard to lysergic acid diethylamide (objective 1) and psilocybin (objective 2).

Fig. 2. Examples of mutations (left) and a crossover (right) as generated by GB-EPI. Note that minor changes to chemical structure can be used to efficiently achieve optimisation even for challenging objectives.

**Fig. 3. Pseudocode description of a generic non-dominated sorting genetic algorithm adapted to the setting of molecular optimisation.**

Fig. 4. Visualisation of the splitting front procedure of non-dominated sorting genetic algorithms: (a) the Pareto dominant front is shown in dark blue, the splitting front is light blue, and the remaining solutions are white. For this example, the second front is chosen as the splitting front, and it is assumed that five more solutions need to be picked to complete the population. These solutions will be indicated with a dark blue circumference. (b) The selection procedure of NSGA-II calculates a distance in objective space to the nearest neighbours in the front. The outermost solutions are picked by default, the remaining solutions are chosen according to the furthest distance from neighbours. (c) The selection procedure of NSGA-III calculates the orthogonal distance to predefined reference directions in objective space and selects the closest solution for each axis. Note that the two objective axes are also used as reference directions so that the outermost solutions are picked by default.

Fig. 5. Timeseries plots with variance bands of the dominated hypervolume, the maximum geometric mean, and internal similarity for the cobimetinib (a–c) and fexofenadine (d–f) tasks as a function of generations of the evolutionary populations. The mean value (solid line) and the 95% confidence interval (variance bands) over twenty runs of NSGA-II (orange), NSGA-III (blue), and GB-EPI (green, optimising the geometric mean) are shown. Details of the experimental setup for these results, including hyperparameters, initial population and chemical filters are discussed in Subsection 4.

See this image and copyright information in PMC

References

1. Schneider P. et al. . Nat. Rev. Drug Discovery. 2019:353–364. - PubMed
1. Cáceres E. L. Tudor M. Cheng A. C. Future Med. Chem. 2020;12:1995–1999. doi: 10.4155/fmc-2020-0259. - DOI - PubMed
1. Doerr S. Majewski M. Pérez A. Krämer A. Clementi C. Noe F. Giorgino T. De Fabritiis G. J. Chem. Theory Comput. 2021;17:2355–2363. doi: 10.1021/acs.jctc.0c01343. - DOI - PMC - PubMed
1. Moret M. et al. . Nat. Mach. Intell. 2020;2:171–180. doi: 10.1038/s42256-020-0160-y. - DOI
1. Zhavoronkov A. et al. . Nat. Biotechnol. 2019;37:1546–1696. doi: 10.1038/s41587-019-0224-x. - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Graph-based molecular Pareto optimisation

Affiliation

Graph-based molecular Pareto optimisation

Author

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources