Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 29;11(12):1783.
doi: 10.3390/biom11121783.

EmbedDTI: Enhancing the Molecular Representations via Sequence Embedding and Graph Convolutional Network for the Prediction of Drug-Target Interaction

Affiliations

EmbedDTI: Enhancing the Molecular Representations via Sequence Embedding and Graph Convolutional Network for the Prediction of Drug-Target Interaction

Yuan Jin et al. Biomolecules. .

Abstract

The identification of drug-target interaction (DTI) plays a key role in drug discovery and development. Benefitting from large-scale drug databases and verified DTI relationships, a lot of machine-learning methods have been developed to predict DTIs. However, due to the difficulty in extracting useful information from molecules, the performance of these methods is limited by the representation of drugs and target proteins. This study proposes a new model called EmbedDTI to enhance the representation of both drugs and target proteins, and improve the performance of DTI prediction. For protein sequences, we leverage language modeling for pretraining the feature embeddings of amino acids and feed them to a convolutional neural network model for further representation learning. For drugs, we build two levels of graphs to represent compound structural information, namely the atom graph and substructure graph, and adopt graph convolutional network with an attention module to learn the embedding vectors for the graphs. We compare EmbedDTI with the existing DTI predictors on two benchmark datasets. The experimental results show that EmbedDTI outperforms the state-of-the-art models, and the attention module can identify the components crucial for DTIs in compounds.

Keywords: drug-target interaction; graph convolutional network; molecular representation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Model architecture. For protein sequences, we leverage GloVe for pretraining the feature embeddings of amino acids and feed them to a CNN model for representation learning. For drugs, we construct two levels of graphs to represent compound structural information, namely the atom graph and substructure graph. Graphs of different levels provide an embedding representation vector respectively through attention and several GCNs. Three embedding vectors are concatenated to output the binding affinity of the drug-target pairs through several fully connected layers.
Figure 2
Figure 2
Two different types of bonds. The red marked one is a bond in a ring, while the blue marked one is a bond outside any ring.
Figure 3
Figure 3
An example of substructure segmentation. The left graph is the atom-level graph, where substructures are marked by different colors. The right one is the substructure-level graph, where each substructure is denoted by a single node in the graph.
Figure 4
Figure 4
The graph feature learning via GCN. Taking the adjacency matrix and feature matrix of a graph as the input, the node-level representation is obtained after convolution operation. Then, the node-level representation is passed through a max-pooling layer to obtain the graph-level representation. Finally, the graph-level representation matrix is expanded, and a 128-dimensional vector is obtained through several fully connected layers.
Figure 5
Figure 5
GCN forward layer with attention. The attention module will consider each pair of nodes and assign them with attention weight αij, which indicates the node j has αij-weighted influence on node i during the propagation.
Figure 6
Figure 6
Predicting scores VS. Real scores on Davis test dataset.
Figure 7
Figure 7
Predicting scores vs. Real scores on KIBA test dataset.
Figure 8
Figure 8
Crystal structure of ligand: phosphoaminophosphonic acid-guanylate ester binding into chain A of K-Ras. Protein sequences are colored as grey ribbon and its hydrophobic surface are also shown around the ribbon.
Figure 9
Figure 9
A fused nitrogen heterocyclic compound molecule with 29 atoms and 17 substructures (processed by partition algorithm). By attention output, the two atoms, C(id = 13) and N(id = 14) with highest normalized attention scores (1.0 and 0.958) are highlighted in the figure (we perform min-max normalization on the scores). The substructure containing the two nodes is assigned with an attention score of 0.945.

References

    1. Politis S.N., Colombo P., Colombo G., Rekkas D.M. Design of experiments (DoE) in pharmaceutical development. Drug Dev. Ind. Pharm. 2017;43:889–901. doi: 10.1080/03639045.2017.1291672. - DOI - PubMed
    1. Kapetanovic I. Computer-aided drug discovery and development (CADDD): In silico-chemico-biological approach. Chem.-Biol. Interact. 2008;171:165–176. doi: 10.1016/j.cbi.2006.12.006. - DOI - PMC - PubMed
    1. Heifetz A., Southey M., Morao I., Townsend-Nicholson A. Computational Methods Used in Hit-to-Lead and Lead Optimization Stages of Structure-Based Drug Discovery. Methods Mol. Biol. 2018;1705:375–394. - PubMed
    1. Gaulton A., Bellis L.J., Bento A.P., Chambers J., Davies M., Hersey A., Light Y., McGlinchey S., Michalovich D., Al-Lazikani B., et al. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40:D1100–D1107. doi: 10.1093/nar/gkr777. - DOI - PMC - PubMed
    1. Wishart D.S., Knox C., Guo A.C., Cheng D., Shrivastava S., Tzur D., Gautam B., Hassanali M. DrugBank: A knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36:D901–D906. doi: 10.1093/nar/gkm958. - DOI - PMC - PubMed

Publication types