. 2021 Nov 29;11(12):1783.

doi: 10.3390/biom11121783.

EmbedDTI: Enhancing the Molecular Representations via Sequence Embedding and Graph Convolutional Network for the Prediction of Drug-Target Interaction

Yuan Jin¹, Jiarui Lu², Runhan Shi¹, Yang Yang^{1

3}

Affiliations

¹ Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China.
² School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China.
³ Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China.

PMID: 34944427
PMCID: PMC8698792
DOI: 10.3390/biom11121783

EmbedDTI: Enhancing the Molecular Representations via Sequence Embedding and Graph Convolutional Network for the Prediction of Drug-Target Interaction

Yuan Jin et al. Biomolecules. 2021.

. 2021 Nov 29;11(12):1783.

doi: 10.3390/biom11121783.

Authors

Yuan Jin¹, Jiarui Lu², Runhan Shi¹, Yang Yang^{1

3}

Affiliations

¹ Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China.
² School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China.
³ Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China.

PMID: 34944427
PMCID: PMC8698792
DOI: 10.3390/biom11121783

Abstract

The identification of drug-target interaction (DTI) plays a key role in drug discovery and development. Benefitting from large-scale drug databases and verified DTI relationships, a lot of machine-learning methods have been developed to predict DTIs. However, due to the difficulty in extracting useful information from molecules, the performance of these methods is limited by the representation of drugs and target proteins. This study proposes a new model called EmbedDTI to enhance the representation of both drugs and target proteins, and improve the performance of DTI prediction. For protein sequences, we leverage language modeling for pretraining the feature embeddings of amino acids and feed them to a convolutional neural network model for further representation learning. For drugs, we build two levels of graphs to represent compound structural information, namely the atom graph and substructure graph, and adopt graph convolutional network with an attention module to learn the embedding vectors for the graphs. We compare EmbedDTI with the existing DTI predictors on two benchmark datasets. The experimental results show that EmbedDTI outperforms the state-of-the-art models, and the attention module can identify the components crucial for DTIs in compounds.

Keywords: drug-target interaction; graph convolutional network; molecular representation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Model architecture. For protein sequences, we leverage GloVe for pretraining the feature embeddings of amino acids and feed them to a CNN model for representation learning. For drugs, we construct two levels of graphs to represent compound structural information, namely the atom graph and substructure graph. Graphs of different levels provide an embedding representation vector respectively through attention and several GCNs. Three embedding vectors are concatenated to output the binding affinity of the drug-target pairs through several fully connected layers.

**Figure 2**
Two different types of bonds. The red marked one is a bond in a ring, while the blue marked one is a bond outside any ring.

**Figure 3**
An example of substructure segmentation. The left graph is the atom-level graph, where substructures are marked by different colors. The right one is the substructure-level graph, where each substructure is denoted by a single node in the graph.

**Figure 4**
The graph feature learning via GCN. Taking the adjacency matrix and feature matrix of a graph as the input, the node-level representation is obtained after convolution operation. Then, the node-level representation is passed through a max-pooling layer to obtain the graph-level representation. Finally, the graph-level representation matrix is expanded, and a 128-dimensional vector is obtained through several fully connected layers.

**Figure 5**
GCN forward layer with attention. The attention module will consider each pair of nodes and assign them with attention weight $α_{i j}$ , which indicates the node j has $α_{i j}$ -weighted influence on node i during the propagation.

**Figure 6**
Predicting scores VS. Real scores on Davis test dataset.

**Figure 7**
Predicting scores vs. Real scores on KIBA test dataset.

**Figure 8**
Crystal structure of ligand: phosphoaminophosphonic acid-guanylate ester binding into chain A of K-Ras. Protein sequences are colored as grey ribbon and its hydrophobic surface are also shown around the ribbon.

**Figure 9**
A fused nitrogen heterocyclic compound molecule with 29 atoms and 17 substructures (processed by partition algorithm). By attention output, the two atoms, C(id = 13) and N(id = 14) with highest normalized attention scores ( $1.0$ and $0.958$ ) are highlighted in the figure (we perform min-max normalization on the scores). The substructure containing the two nodes is assigned with an attention score of $0.945$ .

See this image and copyright information in PMC

References

1. Politis S.N., Colombo P., Colombo G., Rekkas D.M. Design of experiments (DoE) in pharmaceutical development. Drug Dev. Ind. Pharm. 2017;43:889–901. doi: 10.1080/03639045.2017.1291672. - DOI - PubMed
1. Kapetanovic I. Computer-aided drug discovery and development (CADDD): In silico-chemico-biological approach. Chem.-Biol. Interact. 2008;171:165–176. doi: 10.1016/j.cbi.2006.12.006. - DOI - PMC - PubMed
1. Heifetz A., Southey M., Morao I., Townsend-Nicholson A. Computational Methods Used in Hit-to-Lead and Lead Optimization Stages of Structure-Based Drug Discovery. Methods Mol. Biol. 2018;1705:375–394. - PubMed
1. Gaulton A., Bellis L.J., Bento A.P., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich D., Al-Lazikani B., et al. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40:D1100–D1107. doi: 10.1093/nar/gkr777. - DOI - PMC - PubMed
1. Wishart D.S., Knox C., Guo A.C., Cheng D., Shrivastava S., Tzur D., Gautam B., Hassanali M. DrugBank: A knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36:D901–D906. doi: 10.1093/nar/gkm958. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

EmbedDTI: Enhancing the Molecular Representations via Sequence Embedding and Graph Convolutional Network for the Prediction of Drug-Target Interaction

Affiliations

EmbedDTI: Enhancing the Molecular Representations via Sequence Embedding and Graph Convolutional Network for the Prediction of Drug-Target Interaction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical