Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 28;40(Suppl 1):i369-i380.
doi: 10.1093/bioinformatics/btae256.

MolPLA: a molecular pretraining framework for learning cores, R-groups and their linker joints

Affiliations

MolPLA: a molecular pretraining framework for learning cores, R-groups and their linker joints

Mogan Gim et al. Bioinformatics. .

Abstract

Motivation: Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts in molecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios.

Results: Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates.

Availability and implementation: The code implementation for MolPLA and its pre-trained model checkpoint is available at https://github.com/dmis-lab/MolPLA.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Schematic illustration for our proposed molecular graph decomposition method. Given a molecule represented as its graph structure GM1 and one of its putative cores GC1,1 identified by Naveja et al.’s framework, multiple decomposition results can be obtained. Since the number of R-groups is three, the total number of decomposition results including decoupling all R-groups from the putative core, is seven. Additional figures are available in the Supplementary Materials S4 and S5.
Figure 2.
Figure 2.
Overview of MolPLA consisting the Masked Graph Contrastive Learning (L1,L2) and R-Group Retrieval Framework (L3). The total loss objective for this pre-training framework is L=L1+L2+L3. All loss objectives employ the same dual InfoNCE loss using in-batch negatives. Details for this method are available in our Supplementary Material S5. The shared graph encoder fθ is built based on five GIN layers while its output are the atom node embeddings given molecular graphs as input.
Figure 3.
Figure 3.
Visualization results of node representations generated by MolPLA for the two reference molecules Streptozocin and Capmatinib.
Figure 4.
Figure 4.
Distribution of QED and SAscore generated by MolPLA for the two reference molecules Streptozocin and Capmatinib. The dashed line represents the scores of the original molecules, while the shaded region indicates a subset of values from the optimized molecules.
Figure 5.
Figure 5.
List of generated molecules for each reference molecule in lead optimization scenario. The generated molecules were selected based on their calculated drug-likeness scores which are QED, SA Score and Docking Score.

Similar articles

References

    1. Axelrod S, Gomez-Bombarelli R.. Geom, energy-annotated molecular conformations for property prediction and molecular generation. Sci Data 2022;9:185. - PMC - PubMed
    1. Bemis GW, Murcko MA.. The properties of known drugs. 1. molecular frameworks. J Med Chem 1996;39:2887–93. - PubMed
    1. Berenger F, Tsuda K.. Molecular generation by fast assembly of (DEEP) smiles fragments. J Cheminform 2021;13:88. - PMC - PubMed
    1. Burley SK, Bhikadiya C, Bi C. et al. RCSB protein data bank (rcsb.org): delivery of experimentally-determined pdb structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res 2023;51:D488–508. - PMC - PubMed
    1. CTTI. AACT. 2016. https://aact.ctti-clinicaltrials.org/ (17 March 2024, date last accessed).

Publication types