. 2023 Jun 12;8(25):23148-23167.

doi: 10.1021/acsomega.3c00883. eCollection 2023 Jun 27.

Impact of Applicability Domains to Generative Artificial Intelligence

Maxime Langevin^{1

2}, Christoph Grebner³, Stefan Güssregen³, Susanne Sauer³, Yi Li⁴, Hans Matter³, Marc Bianciotto²

Affiliations

¹ PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.
² Molecular Design Sciences-Integrated Drug Discovery, R&D, Sanofi, 94400 Vitry-sur-Seine, France.
³ Molecular Design Sciences-Integrated Drug Discovery, R&D, Sanofi, 65929 Frankfurt-am-Main, Germany.
⁴ Molecular Design Sciences-Integrated Drug Discovery, R&D, Sanofi, Waltham, Massachusetts 02451, United States.

PMID: 37396211
PMCID: PMC10308412
DOI: 10.1021/acsomega.3c00883

Impact of Applicability Domains to Generative Artificial Intelligence

Maxime Langevin et al. ACS Omega. 2023.

. 2023 Jun 12;8(25):23148-23167.

doi: 10.1021/acsomega.3c00883. eCollection 2023 Jun 27.

Authors

Maxime Langevin^{1

2}, Christoph Grebner³, Stefan Güssregen³, Susanne Sauer³, Yi Li⁴, Hans Matter³, Marc Bianciotto²

Affiliations

¹ PASTEUR, Département de Chimie, École Normale Supérieure, PSL University, Sorbonne Université, CNRS, 75005 Paris, France.
² Molecular Design Sciences-Integrated Drug Discovery, R&D, Sanofi, 94400 Vitry-sur-Seine, France.
³ Molecular Design Sciences-Integrated Drug Discovery, R&D, Sanofi, 65929 Frankfurt-am-Main, Germany.
⁴ Molecular Design Sciences-Integrated Drug Discovery, R&D, Sanofi, Waltham, Massachusetts 02451, United States.

PMID: 37396211
PMCID: PMC10308412
DOI: 10.1021/acsomega.3c00883

Abstract

Molecular generative artificial intelligence is drawing significant attention in the drug design community, with several experimentally validated proof of concepts already published. Nevertheless, generative models are known for sometimes generating unrealistic, unstable, unsynthesizable, or uninteresting structures. This calls for methods to constrain those algorithms to generate structures in drug-like portions of the chemical space. While the concept of applicability domains for predictive models is well studied, its counterpart for generative models is not yet well-defined. In this work, we empirically examine various possibilities and propose applicability domains suited for generative models. Using both public and internal data sets, we use generative methods to generate novel structures that are predicted to be actives by a corresponding quantitative structure-activity relationships model while constraining the generative model to stay within a given applicability domain. Our work looks at several applicability domain definitions, combining various criteria, such as structural similarity to the training set, similarity of physicochemical properties, unwanted substructures, and quantitative estimate of drug-likeness. We assess the structures generated from both qualitative and quantitative points of view and find that the applicability domain definitions have a strong influence on the drug-likeness of generated molecules. An extensive analysis of our results allows us to identify applicability domain definitions that are best suited for generating drug-like molecules with generative models. We anticipate that this work will help foster the adoption of generative models in an industrial context.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): All authors are or have been employed by Sanofi and may hold shares and/or stock options in the company.

Figures

**Figure 1**
Overview of the workflow used for evaluating an AD. During generation, the AD is taken into account in the reward function as a multiplicative term that yields 0 if the molecule generated is out of the AD and 1 otherwise. The colors used for the arrows are related to the different subsets of the data set that are used at different stages of the process: green for the activity model training set, blue for set used for AD definition and generative model pretraining, and orange for the test set.

**Figure 2**
From left to right: chemical structures of clopidogrel, medroxyprogesterone acetate (a steroid), and ivermectin (a drug derived from a natural product). The diversity in chemical features of these three drugs (e.g., presence of macrocycles, number of cycles, and number of chiral centers) shows how much drug-likeness is context-dependent.

**Figure 3**
Limitations of binary fingerprints to discriminate unusual chemical moieties. (Top) Binary. (Bottom) Count.

**Figure 4**
JAK2: Projection of molecules generated with the LSTM-HC model on the original data set using the first two dimensions of the PCA of their Morgan fingerprints. Blue dots represent generated molecules, green dots represent actives from the test set, and red dots represent inactives. The three AD metrics in the top row lead to more diverse molecules than the ones in the middle and bottom rows.

**Figure 5**
Entropy as a measure of the coverage of the training set diversity. The more homogeneous the repartition of the generated molecules between clusters is, the higher the entropy will be. Generated molecules too far from the training set are set apart. This example displays molecules from the JAK2 data set.

**Figure 6**
QED distribution of generated molecules for different applicability domain definitions on the JAK2 data set. The vertical green lines correspond to the minimum and maximum values for QED found in the training set, where higher is better.

**Figure 7**
Comparison of data set and generated molecules on the JAK2 test case (here with the maximum similarity on atom-pair descriptors) with the distributions for properties identified as important through qualitative analysis.

**Figure 8**
Comparison of scores reached by different algorithms when optimizing JAK2 predicted activity while staying in the “range physchem + range ECFP4 counts” AD. LSTM-HC reaches higher optimization scores and recovers slightly more active compounds than Graph GA and SMILES GA.

**Figure 9**
Comparison of scores reached by the LSTM-HC under the constraint of different AD definitions on the ChEMBL 11βHSD data set.

**Figure 10**
Results of the molecular Turing test for each of four different AD definitions (“range QED”, “maxsim ECFP4”, “range physchem + maxsim ECFP4”, and “range physchem + range ECFP4 counts”) and for the JAK2 training set. The black bar denotes the mean, and the box denotes an interval with 90% of the values. Results were obtained with 15 different participants.

**Figure 11**
Tree map plot of the Renin test set (in green), molecules generated with a good applicability domain (“range physchem + range ECFP4 counts”, in blue), and molecules generated by an applicability domain showing poor results (“maxsim ECFP4”, in purple). The molecules from the good AD and the test data set (blue and green, respectively) mainly fall on the same trees and share connections, suggesting that the two sets could be similar and the generated molecules relevant. In contrast, the molecules from the bad AD are all located on a separate tree, suggesting that they are dissimilar from the test set and probably irrelevant. The molecule generated with the bad AD that is closest to the test data set (purple arrow) is still dissimilar to the closest molecules from the test data set (green arrow) or from those generated with a good AD (blue arrow). The tree map is generated using the TMAP library with default settings.

**Figure 12**
Fraction of actives and inactives among the Renin training set molecules close to the generated molecules at different similarity thresholds. Results are shown for the “range physchem + range ECFP4 counts” and “maxsim ECFP4” applicability domains. They show that good applicability domains generate molecules closer to the actual training set, with a clear enrichment toward active molecules.

**Figure 13**
Renin data set: evolution of generated molecules’ scores throughout the optimization epochs for a good applicability domain (“range physchem + range ECFP4 counts”) and for an applicability domain showing poor results (“maxsim ECFP4”). The scores of the molecules generated with the good applicability domain are more spread out and lower than those generated with the other applicability domain (while most are still in the correct range, between the predicted active threshold and the maximum score among the test set molecules). This illustrates that poorly performing ADs leave room for reward hacking by the generator.

**Figure 14**
Average Tanimoto similarities (computed on ECFP4 fingerprints) for generated sets of molecules using a good applicability domain definition on the JAK2 data set. Different applicability domains can lead to the exploration of different portions of chemical space.

See this image and copyright information in PMC

Cited by

G4-QuadScreen: A Computational Tool for Identifying Multi-Target-Directed Anticancer Leads against G-Quadruplex DNA.
Bhat-Ambure J, Ambure P, Serrano-Candelas E, Galiana-Roselló C, Gil-Martínez A, Guerrero M, Martin M, González-García J, García-España E, Gozalbes R. Bhat-Ambure J, et al. Cancers (Basel). 2023 Jul 27;15(15):3817. doi: 10.3390/cancers15153817. Cancers (Basel). 2023. PMID: 37568632 Free PMC article.
A data-driven generative strategy to avoid reward hacking in multi-objective molecular design.
Yoshizawa T, Ishida S, Sato T, Ohta M, Honma T, Terayama K. Yoshizawa T, et al. Nat Commun. 2025 Mar 11;16(1):2409. doi: 10.1038/s41467-025-57582-3. Nat Commun. 2025. PMID: 40069140 Free PMC article.
Applicability Domain for Trustable Predictions.
Yang S, Kar S. Yang S, et al. Methods Mol Biol. 2025;2834:131-149. doi: 10.1007/978-1-0716-4003-6_6. Methods Mol Biol. 2025. PMID: 39312163
A molecular representation system with a common reference frame for analyzing triterpenoid structural diversity.
Babineau N, Dien Nguyen LT, Mathieu D, McCue C, Schlecht N, Abrahamson T, Hamberger B, Busta L. Babineau N, et al. Plant Commun. 2025 May 12;6(5):101320. doi: 10.1016/j.xplc.2025.101320. Epub 2025 Mar 24. Plant Commun. 2025. PMID: 40134219 Free PMC article.
MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design.
Thomas M, O'Boyle NM, Bender A, De Graaf C. Thomas M, et al. J Cheminform. 2024 May 30;16(1):64. doi: 10.1186/s13321-024-00861-w. J Cheminform. 2024. PMID: 38816825 Free PMC article.

See all "Cited by" articles

References

1. Schneider P.; et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discovery 2020, 19, 353–364. 10.1038/s41573-019-0050-3. - DOI - PubMed
1. Olivecrona M.; Blaschke T.; Engkvist O.; Chen H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 2017, 9, 48.10.1186/s13321-017-0235-x. - DOI - PMC - PubMed
1. Segler M. H. S.; Kogej T.; Tyrchan C.; Waller M. P. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS Central Science 2018, 4, 120–131. 10.1021/acscentsci.7b00512. - DOI - PMC - PubMed
1. Brown N.; Fiscato M.; Segler M. H.; Vaucher A. C. GuacaMol: Benchmarking Models for de Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1096–1108. 10.1021/acs.jcim.8b00839. - DOI - PubMed
1. Wildman S. A.; Crippen G. M. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 868–873. 10.1021/ci990307l. - DOI

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Impact of Applicability Domains to Generative Artificial Intelligence

Affiliations

Impact of Applicability Domains to Generative Artificial Intelligence

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources