Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 28;14(1):86.
doi: 10.1186/s13321-022-00667-8.

Human-in-the-loop assisted de novo molecular design

Affiliations

Human-in-the-loop assisted de novo molecular design

Iiris Sundin et al. J Cheminform. .

Abstract

A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer's implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user's feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user's idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system.

Keywords: AI-assisted design; De novo molecular design; Expert knowledge elicitation; Goal-oriented molecule generation; Human-in-the-loop; Interactive algorithms; Reward elicitation.

PubMed Disclaimer

Conflict of interest statement

This work was financially supported by AstraZeneca. The authors declare no other competing interests.

Figures

Fig. 1
Fig. 1
Human-in-the-loop de novo molecular design: an AI-assistant helps a chemist to decide parameters of an MPO objective function Sr,tx iteratively at round r and iteration t, where r are rounds of goal-directed molecule generation with a de novo design tool, and t are online interactions with a chemist. The objective consists of K molecular properties ckx with relative weights wk. The utility of the k:th property is measured using a desirability function ϕr,t,k that defines the range of good property values. At each iteration, the method selects a molecule xr,t to query, which the chemist evaluates with feedback y. The method then adapts Sr,tx based on the feedback by estimating the parameters of ϕr,t,k to match the chemist’s underlying goal
Fig. 2
Fig. 2
Graphical user interface for giving feedback to molecules. The chemist evaluates DRD2 activity of molecules on a scale from 1 to 5 For initialization, we randomly sample 10 molecules and get their scores from the oracle. For the experiment with a human chemist, we randomly sample 10,000 molecules to be unlabeled molecules U to speed up the method. For ten iterations we sequentially query 100 molecules in batches of 10 from a chemist, who evaluates them on a scale from 1 to 5 (0 = very likely not active, 5 = very likely active). The scores are linearly scaled to the range [0,1]. The order of the evaluated molecules is chosen using Thompson sampling that was the best in the simulated experiments. For evaluating the performance, the oracle model is used to score the molecules generated by REINVENT with the chemist’s component as a scoring function at iteration t=1,,10.
Fig. 3
Fig. 3
The parameters of the MPO objective are better estimated with increasing amount of feedback. The mean relative absolute error (MRAE) in the estimated parameters decreases with increasing human feedback, and fastest with Thompson sampling. Solid lines show average of MRAE over 10 random seeds, and the shaded areas one standard error of the mean (SEM)
Fig. 4
Fig. 4
The average oracle score of the generated molecules increases at each round of adapting the MPO. At each round, a new batch of molecules is generated using an adapted scoring function after in total 110 queries (round 1) and 220 queries (round 2) to a simulated chemist. For comparison, we show round 0 that is the performance with the initial guess θ0. The bars show the mean of the average oracle score of the generated molecules over 10 random seeds, and the error bars represent one SEM. The gray horizontal line shows the average oracle score in 5000 molecules sampled from REINVENT without MPO objective, using its prior agent
Fig. 5
Fig. 5
A non-parametric scoring component that represents the chemist’s knowledge improves REINVENT output even with small number of queries (< 100) to a simulated chemist. The lines show the average oracle score in the REINVENT output and shaded areas its variation in 10 repeated experiments (mean and SEM). The method is not very sensitive to Gaussian noise in the simulated chemist’s answers. a noise level σchemist=0.0, b σchemist=0.15, c σchemist=0.30
Fig. 6
Fig. 6
A medicinal chemist’s feedback on DRD2 activity of molecules improves the average activity of the generated molecules, measured using activity prediction model described in “Task 2: Learn human knowledge about a molecular property as a separate component” section. The dashed lines show performance in three repeated experiments. The performance is summarized in mean performance (solid line) with one standard error of mean (shaded area). The repetitions differ by different randomly sampled initial data and consequently different actively selected queries. The queries are selected using Thompson sampling

References

    1. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23(6):1241–1250. doi: 10.1016/J.DRUDIS.2018.01.039. - DOI - PubMed
    1. Mervin L, Genheden S, Engkvist O. AI for drug design: From explicit rules to deep learning. Artif Intell Life Sci. 2022;2:100041. doi: 10.1016/J.AILSCI.2022.100041. - DOI
    1. Patronov A, Papadopoulos K, Engkvist O. Has artificial intelligence impacted drug discovery? Methods Mol Biol. 2022;2390:153–176. doi: 10.1007/978-1-0716-1787-8_6/COVER. - DOI - PubMed
    1. Blaschke T, et al. REINVENT 2.0: an AI tool for de novo drug design. J Chem Inf Model. 2020 doi: 10.1021/acs.jcim.0c00915. - DOI - PubMed
    1. Kadurin A, Nikolenko S, Khrabrov K, Aliper A, Zhavoronkov A. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol Pharm. 2017;14(9):3098–3104. doi: 10.1021/ACS.MOLPHARMACEUT.7B00346. - DOI - PubMed