Energy-based graph convolutional networks for scoring protein docking models

Yue Cao¹, Yang Shen^{1

2}

Affiliations

¹ Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas.
² TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas.

PMID: 32144844
PMCID: PMC7374013
DOI: 10.1002/prot.25888

Energy-based graph convolutional networks for scoring protein docking models

Yue Cao et al. Proteins. 2020 Aug.

. 2020 Aug;88(8):1091-1099.

doi: 10.1002/prot.25888. Epub 2020 Mar 16.

Authors

Yue Cao¹, Yang Shen^{1

2}

Affiliations

¹ Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas.
² TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas.

PMID: 32144844
PMCID: PMC7374013
DOI: 10.1002/prot.25888

Abstract

Structural information about protein-protein interactions, often missing at the interactome scale, is important for mechanistic understanding of cells and rational discovery of therapeutics. Protein docking provides a computational alternative for such information. However, ranking near-native docked models high among a large number of candidates, often known as the scoring problem, remains a critical challenge. Moreover, estimating model quality, also known as the quality assessment problem, is rarely addressed in protein docking. In this study, the two challenging problems in protein docking are regarded as relative and absolute scoring, respectively, and addressed in one physics-inspired deep learning framework. We represent protein and complex structures as intra- and inter-molecular residue contact graphs with atom-resolution node and edge features. And we propose a novel graph convolutional kernel that aggregates interacting nodes' features through edges so that generalized interaction energies can be learned directly from 3D data. The resulting energy-based graph convolutional networks (EGCN) with multihead attention are trained to predict intra- and inter-molecular energies, binding affinities, and quality measures (interface RMSD) for encounter complexes. Compared to a state-of-the-art scoring function for model ranking, EGCN significantly improves ranking for a critical assessment of predicted interactions (CAPRI) test set involving homology docking; and is comparable or slightly better for Score_set, a CAPRI benchmark set generated by diverse community-wide docking protocols not known to training data. For Score_set quality assessment, EGCN shows about 27% improvement to our previous efforts. Directly learning from 3D structure data in graph representation, EGCN represents the first successful development of graph convolutional networks for protein docking.

Keywords: energy-based models; graph convolutional networks; machine learning; protein docking; protein-protein interactions; quality estimation; scoring function.

PubMed Disclaimer

Figures

**Figure 1.**
The architecture of the proposed graph convolutional network (GCN) models for intra- or inter-molecular energies. In our work, there are five types of such models together for predicting encounter-complex binding energy, including 4 intra-molecular models with shared parameters for the unbound or encountered receptor or ligand as well as 1 inter-molecular model for the encounter complex. In each type of model, the inputs (to the left of the arrow) include a pair of node-feature matrices (*X_A* and *X_B*) for individual protein(s) and an edge-feature tensor (A) for intra- or inter-molecular contacts. And the inputs are fed through 3 layers of our energy-based graph convolution layers that learn from training data to aggregate and transform atomic interactions, followed by multi-head attention module and fully-connected layers for the output of intra- or inter-molecular energy.

**Figure 2.**
Comparing relative scoring (ranking) performances among IRAD, RF, and EGCN. Reported are enrichments ratios of acceptable models among the first P percentage, top-ranked decoys for (a) benchmark test set, (b) CAPRI test set, and (c) Score_set, a CAPRI benchmark for scoring.

**Figure 3.**
Comparing absolute scoring (quality estimation) performances among RF and EGCN. Reported are the RMSE of iRMSD predictions for (a) benchmark test set, (b) CAPRI test set, and (c) Score_set, a CAPRI benchmark for scoring.

See this image and copyright information in PMC

References

1. Mosca R, Céol A, Aloy P. Interactome3D: adding structural details to protein networks. Nat Methods 2013;10(1):47. - PubMed
1. Porter KA, Desta I, Kozakov D, Vajda S. What method to use for protein–protein docking? Curr Opin Struct Biol 2019;55:1–7. - PMC - PubMed
1. Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A. Assessment of the assessment: evaluation of the model quality estimates in CASP10. Proteins Struct Funct Bioinforma 2014;82:112–126. - PMC - PubMed
1. Cao R, Bhattacharya D, Hou J, Cheng J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics 2016;17(1):495. - PMC - PubMed
1. Manavalan B, Lee J, Lee J. Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms. PloS One 2014;9(9):e106542. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Energy-based graph convolutional networks for scoring protein docking models

Affiliations

Energy-based graph convolutional networks for scoring protein docking models

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources