Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 22;26(1):bbae690.
doi: 10.1093/bib/bbae690.

STNGS: a deep scaffold learning-driven generation and screening framework for discovering potential novel psychoactive substances

Affiliations

STNGS: a deep scaffold learning-driven generation and screening framework for discovering potential novel psychoactive substances

Dongping Liu et al. Brief Bioinform. .

Abstract

The supervision of novel psychoactive substances (NPSs) is a global problem, and the regulation of NPSs was heavily relied on identifying structural matches in established NPSs databases. However, violators could circumvent legal oversight by altering the side chain structure of recognized NPSs and the existing methods cannot overcome the inaccuracy and lag of supervision. In this study, we propose a scaffold and transformer-based NPS generation and Screening (STNGS) framework to systematically identify and evaluate potential NPSs. A scaffold-based generative model and a rank function with four parts are contained by our framework. Our generative model shows excellent performance in the design and optimization of general molecules and NPS-like molecules by chemical space analysis and property distribution analysis. The rank function includes synthetic accessibility score and frequency score, as well as confidence score and affinity score evaluated by a neural network, which enables the precise positioning of potential NPSs. Applied STNGS framework with molecular docking and a G protein-coupled receptor (GPCR) activation-based sensor (GRAB), we successfully identify three novel synthetic cannabinoids with activity. STNGS constrains the chemical space to generate NPS-like molecules database with diversity and novelty, which assists in the ex-ante regulation of NPSs.

Keywords: deep scaffold learning; ensemble learning; generative framework; novel psychoactive substance; synthetic cannabinoids.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Figure 1
Figure 1
Framework overview. (a) STNGS: Data augment block, generative block and ranking block. (b) Architecture of generative model. The molecular scaffold SMILES is processed by the T-scaffold self-encoder-decoder to extract the latent vector of scaffolds, which is then passed to the Mol-GPT module for molecular generation training. (c) Architecture of NPS-discriminator. This model is used as a sub-model and trained 100 times to obtain 100 models, which are then integrated to form the final NPS discriminator. (d) Architecture of affinity prediction model. The bond feature matrix is concatenated with the atomic feature matrix after passing through the bond message passing layers. It is input to the regression layer with the adjacency feature matrix, the distance feature matrix, the coulomb feature matrix and the molecular description vector to calculate the affinity score. Alt text: Overview of the STNGS framework (a) with three blocks: Data augmentation, generation, and ranking. (b) Generative model: T-scaffold extracts scaffold latent vectors for Mol-GPT molecular generation. (c) NPS-discriminator: 100 two-layer LSTM sub-models integrated for discrimination. (d) Affinity prediction: Bond and atomic features combined for regression based affinity scoring.
Figure 2
Figure 2
Effect of data augmentation multiplier, scaffold type and sampling temperature on model performance. (a) Association of Data Enhancement Multiples with validity and scaffold ratio. (b) Two compounds and their corresponding two scaffolds. (c) Impact of two scaffolds on validity, uniqueness and novelty. (d) Impact of sampling temperature on validity, uniqueness, and novelty. (e) Molecules generated by four scaffolds at different sampling temperatures. Alt text: Impact of data augmentation, scaffold type, and sampling temperature on model performance. (a) Data enhancement multiples affect validity and scaffold ratio. (b) Two compounds and their scaffolds. (c) BM scaffold improves uniqueness. (d) Sampling temperature influences validity, uniqueness, and novelty. (e) Molecules generated by four scaffolds at different temperatures.
Figure 3
Figure 3
Chemical space and property distribution of generated molecules. (a) UMAP visualization of 5000 randomly selected molecules form MOSES dataset and 5000 randomly selected molecules from the trained generative model. (b) Properties distribution of 30 000 randomly selected molecules form MOSES dataset and 30 000 randomly selected molecules from the trained generative model. (c) UMAP visualization of 2154 molecules form NPS dataset, and all generated NPS-like molecules from the trained generative model. (d) Properties distribution of 2154 molecules form NPS dataset and all generated NPS-like molecules from the trained generative model. Alt text: Chemical space and property distribution of generated molecules. (a) UMAP of 5000 MOSES and 5000 model-generated molecules. (b) Property distributions of 30 000 MOSES and model-generated molecules. (c) UMAP of 2154 NPS dataset molecules and generated NPS-like molecules. (d) Property distributions of NPS dataset molecules and generated NPS-like molecules.
Figure 4
Figure 4
Effect of sampling frequency on the rank function. (a) Relationship between sampling frequency and number of molecules. (b) Nearest-neighbor Tanimoto coefficients to known NPS for all molecules in different sampling frequency groups. (c) Number of emerging NPS molecules hit by different scoring functions. Alt text: Effect of sampling frequency on rank function. (a) Sampling frequency versus number of molecules. (b) Nearest-neighbor Tanimoto coefficients for molecules in different sampling frequency groups. (c) Number of emerging NPS molecules identified by different scoring functions.
Figure 5
Figure 5
Experimental validation of the generated molecular activity. (a) The binding poses and interaction modes of the target CB1 with three compounds. The dashed line represents the π-π stacking interaction. (b) Expression and fluorescence change in response to three compounds in cells. (c) Effectiveness curves for JWH-018 and three compounds. The effectiveness on cells is shown as the fluorescence expression intensity. (d) Affinity assay and KD values of JWH-018 and three compounds. Alt text: Experimental validation of generated molecular activity. (a) Binding poses and interactions with CB1 receptor. (b) Fluorescence response in cells. (c) Effectiveness curves for JWH-018 and three compounds. (d) Affinity assay results and KD values.

References

    1. Peacock A, Bruno R, Gisev N. et al. New psychoactive substances: challenges for drug surveillance, control, and public health responses. The Lancet 2019;394:1668–84. 10.1016/S0140-6736(19)32231-7. - DOI - PubMed
    1. Baumann MH, Solis E, Watterson LR. et al. Baths salts, spice, and related designer drugs: the science behind the headlines. J Neurosci 2014;34:15150–8. 10.1523/JNEUROSCI.3223-14.2014. - DOI - PMC - PubMed
    1. Smith JP, Sutcliffe OB, Banks CE. An overview of recent developments in the analytical detection of new psychoactive substances (NPSs). Analyst 2015;140:4932–48. 10.1039/C5AN00797F. - DOI - PubMed
    1. Yang Y, Liu D, Hua Z. et al. Machine learning-assisted rapid screening of four types of new psychoactive substances in drug seizures. J Chem Inf Model 2023;63:815–25. 10.1021/acs.jcim.2c01342. - DOI - PubMed
    1. Nichols D. Legal highs: the dark side of medicinal chemistry. Nature 2011;469:7–7. 10.1038/469007a. - DOI - PubMed

MeSH terms