Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 20;3(11):100621.
doi: 10.1016/j.crmeth.2023.100621. Epub 2023 Oct 23.

Molecular geometric deep learning

Affiliations

Molecular geometric deep learning

Cong Shen et al. Cell Rep Methods. .

Abstract

Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.

Keywords: CP: Molecular biology; CP: Systems biology; geometric deep learning; graph neural network; molecular property prediction.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Illustration of different molecular graph representations and the performance of their GDLs in six commonly used datasets The de facto standard of covalent-bond-based molecular graph representation has clear limitations and is inferior to models with only non-covalent interactions. (A) Molecular graph representations for monobenzone molecule. The de facto standard of the covalent bond model is represented in orange, and the other four non-covalent-interaction-based graphs are in blue. (B) The performance of GDLs with five different molecular representations on the six most commonly used datasets. The color of the bar (for each model) is the same as that of the corresponding molecular graph. For instance, the orange bars are for GDLs with covalent-bond-based molecular graphs.
Figure 2
Figure 2
Illustration of geometric node features for Mol-GDL Different from all previous models, our geometric node features are solely determined by atom types and Euclidean distances between atoms. (A) The illustration of a carbon atom (in pink) and its neighboring carbon atoms (in dark black) and hydrogen atoms (in light gray) from a molecular graph for monobenzone. (B) In our geometric node features, the neighboring atoms are grouped based on their atom types. Here, the two carbon atoms are classified into one group, and the four hydrogen atoms are classified into the other. For each group, the Euclidian distances between all the neighboring atoms to the carbon atoms are classified into several intervals. For each interval, we count the total number (or frequency) of distances within it. Here, three equal-sized intervals are considered. For carbon atoms, their frequencies in these intervals are (2, 1, 0), and for hydrogen atoms, their numbers are (2, 0, 2). These frequency numbers are then concatenated (in a pre-defined order according to atom types) into a fixed-length long vector, i.e., our geometric node feature.
Figure 3
Figure 3
The flowchart of our Mol-GDL model A set of molecular graphs are systematically constructed for each molecule. The geometric node features goes through the same message-passing module. The two pooling operations, one done within graphs and the other between groups, are used to aggregate individual node features into a single molecular feature vector. A multi-layer perceptron (MLP) is employed on the molecular feature vector to generate the final prediction.
Figure 4
Figure 4
Comparison of different methods of calculating node features

Similar articles

Cited by

References

    1. Zhang L., Tan J., Han D., Zhu H. From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov. Today. 2017;22:1680–1685. - PubMed
    1. Chen H., Engkvist O., Wang Y., Olivecrona M., Blaschke T. The rise of deep learning in drug discovery. Drug Discov. Today. 2018;23:1241–1250. - PubMed
    1. Mak K.K., Pichika M.R. Artificial intelligence in drug development: present status and future prospects. Drug Discov. Today. 2019;24:773–780. - PubMed
    1. Chan H.C.S., Shan H., Dahoun T., Vogel H., Yuan S. Advancing drug discovery via artificial intelligence. Trends Pharmacol. Sci. 2019;40 801–604. - PubMed
    1. Puzyn T., Leszczynski J., Cronin M.T., editors. Recent Advances in QSAR Studies: Methods and Applications. vol. 8. Springer Science & Business Media; 2010.

Publication types

LinkOut - more resources