. 2024 Jun 28;40(Suppl 1):i369-i380.

doi: 10.1093/bioinformatics/btae256.

MolPLA: a molecular pretraining framework for learning cores, R-groups and their linker joints

Mogan Gim¹, Jueon Park¹, Soyon Park¹, Sanghoon Lee^{1

2}, Seungheun Baek¹, Junhyun Lee¹, Ngoc-Quang Nguyen¹, Jaewoo Kang^{1

2}

Affiliations

¹ Department of Computer Science, Korea University, Seoul 02841, Republic of Korea.
² AIGEN Sciences, Seoul 04778, Republic of Korea.

PMID: 38940143
PMCID: PMC11211832
DOI: 10.1093/bioinformatics/btae256

MolPLA: a molecular pretraining framework for learning cores, R-groups and their linker joints

Mogan Gim et al. Bioinformatics. 2024.

. 2024 Jun 28;40(Suppl 1):i369-i380.

doi: 10.1093/bioinformatics/btae256.

Authors

Mogan Gim¹, Jueon Park¹, Soyon Park¹, Sanghoon Lee^{1

2}, Seungheun Baek¹, Junhyun Lee¹, Ngoc-Quang Nguyen¹, Jaewoo Kang^{1

2}

Affiliations

¹ Department of Computer Science, Korea University, Seoul 02841, Republic of Korea.
² AIGEN Sciences, Seoul 04778, Republic of Korea.

PMID: 38940143
PMCID: PMC11211832
DOI: 10.1093/bioinformatics/btae256

Abstract

Motivation: Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts in molecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios.

Results: Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates.

Availability and implementation: The code implementation for MolPLA and its pre-trained model checkpoint is available at https://github.com/dmis-lab/MolPLA.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
Schematic illustration for our proposed molecular graph decomposition method. Given a molecule represented as its graph structure $G_{M_{1}}$ and one of its putative cores $G_{C_{1, 1}}$ identified by Naveja *et al.*’s framework, multiple decomposition results can be obtained. Since the number of R-groups is three, the total number of decomposition results including decoupling all R-groups from the putative core, is seven. Additional figures are available in the Supplementary Materials S4 and S5.

**Figure 2.**
Overview of *MolPLA* consisting the Masked Graph Contrastive Learning ( $L_{1}, L_{2}$ ) and R-Group Retrieval Framework ( $L_{3}$ ). The total loss objective for this pre-training framework is $L = L_{1} + L_{2} + L_{3}$ . All loss objectives employ the same dual InfoNCE loss using in-batch negatives. Details for this method are available in our Supplementary Material S5. The shared graph encoder $f_{θ}$ is built based on five GIN layers while its output are the atom node embeddings given molecular graphs as input.

**Figure 3.**
Visualization results of node representations generated by *MolPLA* for the two reference molecules Streptozocin and Capmatinib.

**Figure 4.**
Distribution of QED and SAscore generated by MolPLA for the two reference molecules Streptozocin and Capmatinib. The dashed line represents the scores of the original molecules, while the shaded region indicates a subset of values from the optimized molecules.

**Figure 5.**
List of generated molecules for each reference molecule in lead optimization scenario. The generated molecules were selected based on their calculated drug-likeness scores which are QED, SA Score and Docking Score.

See this image and copyright information in PMC

References

1. Axelrod S, Gomez-Bombarelli R.. Geom, energy-annotated molecular conformations for property prediction and molecular generation. Sci Data 2022;9:185. - PMC - PubMed
1. Bemis GW, Murcko MA.. The properties of known drugs. 1. molecular frameworks. J Med Chem 1996;39:2887–93. - PubMed
1. Berenger F, Tsuda K.. Molecular generation by fast assembly of (DEEP) smiles fragments. J Cheminform 2021;13:88. - PMC - PubMed
1. Burley SK, Bhikadiya C, Bi C. et al. RCSB protein data bank (rcsb.org): delivery of experimentally-determined pdb structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res 2023;51:D488–508. - PMC - PubMed
1. CTTI. AACT. 2016. https://aact.ctti-clinicaltrials.org/ (17 March 2024, date last accessed).

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MolPLA: a molecular pretraining framework for learning cores, R-groups and their linker joints

Affiliations

MolPLA: a molecular pretraining framework for learning cores, R-groups and their linker joints

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous