Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 14;4(11):pgaf329.
doi: 10.1093/pnasnexus/pgaf329. eCollection 2025 Nov.

Stereochemistry-aware string-based molecular generation

Affiliations

Stereochemistry-aware string-based molecular generation

Gary Tom et al. PNAS Nexus. .

Abstract

This study investigates the impact of incorporating stereochemical information, a crucial aspect of computational drug discovery and materials design, in molecular generative modeling. We present a detailed comparison of stereochemistry-aware and conventionally stereochemistry-unaware string-based generative approaches, utilizing both genetic algorithms and reinforcement learning-based techniques. To evaluate these models, we introduce novel benchmarks specifically designed to assess the importance of stereochemistry-aware generative modeling. Our results demonstrate that stereochemistry-aware models generally perform on par with or surpass conventional algorithms across various stereochemistry-sensitive tasks. However, we also observe that in scenarios where stereochemistry plays a less critical role, stereochemistry-aware models may face challenges due to the increased complexity of the chemical space they must navigate. This work provides insights into the trade-offs involved in incorporating stereochemical information in molecular generative models and offers guidance for selecting appropriate approaches based on specific application requirements.

Keywords: drug design; generative modeling; machine learning; molecular generation; stereochemistry.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Example of isomeric molecule encoded with SMILES, SELFIES, and GroupSELFIES.
Fig. 2.
Fig. 2.
Optimization traces for rediscovery tasks. The cumulative top-1 similarity score to the target molecule as a function of generation of optimization. Shaded regions indicate the 95% CI. The dashed line is the best score found in the starting dataset.
Fig. 3.
Fig. 3.
Structures of proteins with native ligands. The structures are from the Protein Data Bank (73). The native ligand in the binding pocket is shown inside a bounding box.
Fig. 4.
Fig. 4.
Optimization traces for docking tasks. The cumulative top-1 docking score for protein targets as a function of generation of optimization. Shaded regions indicate the 95% CI. The dashed line is the best score found in the starting dataset.
Fig. 5.
Fig. 5.
Optimization traces for CD task. The cumulative top-1 CD peak score as a function of generation of optimization. Shaded regions indicate the 95% CI. The dashed line is the best score found in the starting dataset.

References

    1. Sanchez-Lengeling B, Aspuru-Guzik A. 2018. Inverse molecular design using machine learning: generative models for matter engineering. Science. 361(6400):360–365. - PubMed
    1. Gómez-Bombarelli R, et al. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci. 4(2):268–276. - PMC - PubMed
    1. Segler MHS, Kogej T, Tyrchan C, Waller MP. 2018. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci. 4(1):120–131. - PMC - PubMed
    1. De Cao N, Kipf T. 2018. Molgan: an implicit generative model for small molecular graphs [preprint], arXiv, arXiv:1805.11973. 10.48550/arXiv.1805.11973 - DOI
    1. Meyers J, Fabian B, Brown N. 2021. De novo molecular design and generative models. Drug Discov Today. 26(11):2707–2715. - PubMed

LinkOut - more resources