. 2025 Aug 1;21(8):e1013343.

doi: 10.1371/journal.pcbi.1013343. eCollection 2025 Aug.

SuperEdgeGO: Edge-supervised graph representation learning for enhanced protein function prediction

Shugang Zhang¹, Yuntong Li¹, Wenjian Ma¹, Qing Cai¹, Jing Qin², Xiangpeng Bi¹, Huasen Jiang¹, Xiaoyu Huang¹, Zhiqiang Wei^{1

3}

Affiliations

¹ College of Computer Science and Technology, Ocean University of China, Qingdao, China.
² College of Education, Qingdao Hengxing University of Science and Technology, Qingdao, China.
³ College of Computer Science and Technology, Qingdao University, Qingdao, China.

PMID: 40749031
PMCID: PMC12327639
DOI: 10.1371/journal.pcbi.1013343

SuperEdgeGO: Edge-supervised graph representation learning for enhanced protein function prediction

Shugang Zhang et al. PLoS Comput Biol. 2025.

. 2025 Aug 1;21(8):e1013343.

doi: 10.1371/journal.pcbi.1013343. eCollection 2025 Aug.

Authors

Shugang Zhang¹, Yuntong Li¹, Wenjian Ma¹, Qing Cai¹, Jing Qin², Xiangpeng Bi¹, Huasen Jiang¹, Xiaoyu Huang¹, Zhiqiang Wei^{1

3}

Affiliations

¹ College of Computer Science and Technology, Ocean University of China, Qingdao, China.
² College of Education, Qingdao Hengxing University of Science and Technology, Qingdao, China.
³ College of Computer Science and Technology, Qingdao University, Qingdao, China.

PMID: 40749031
PMCID: PMC12327639
DOI: 10.1371/journal.pcbi.1013343

Abstract

Understanding the functions of proteins is of great importance for deciphering the mechanisms of life activities. To date, there have been over 200 million known proteins, but only 0.2% of them have well-annotated functional terms. By measuring the contacts among residues, proteins can be described as graphs so that the graph leaning approaches can be applied to learn protein representations. However, existing graph-based methods put efforts in enriching the residue node information and did not fully exploit the edge information, which leads to suboptimal representations considering the strong association of residue contacts to protein structures and to the functions. In this article, we propose SuperEdgeGO, which introduces the supervision of edges in protein graphs to learn a better graph representation for protein function prediction. Different from common graph convolution methods that uses edge information in a plain or unsupervised way, we introduce a supervised attention to encode the residue contacts explicitly into the protein representation. Comprehensive experiments demonstrate that SuperEdgeGO achieves state-of-the-art performance on all three categories of protein functions. Additional ablation analysis further proves the effectiveness of the devised edge supervision strategy. The implementation of edge supervision in SuperEdgeGO resulted in enhanced graph representations for protein function prediction, as demonstrated by its superior performance across all the evaluated categories. This superior performance was confirmed through ablation analysis, which validated the effectiveness of the edge supervision strategy. This strategy has a broad application prospect in the study of protein function and related fields.

Copyright: © 2025 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. The overall architecture of SuperEdgeGO.**
**Stage I.** The input protein sequence is first sent to the protein language model ESM-2 to generate the feature matrix, and to the protein structure model AlphaFold2 to predict structures, which is eventually processed as the adjacency matrix. **Stage II.** The two matrices are fed into the model that consists of three graph attention layers, a pooling layer, and a fully-connected classifier. Particularly, the graph attention layer contains both unsupervised and supervised attention modules. **Stage III.** The model is optimized via minimizing two losses, namely the main task loss $ℒ_{V}$ arising from the wrong prediction of GO terms, and the self-supervised loss $ℒ_{e}$ coming from the deviation of attention scores from the binary label indicating the presence of edges.

**Fig 2. Pn-Pe three-dimensional diagram.**

**Fig 3. Model performance of different hyperparameter settings.**
(a) The model achieves its optimal results when $λ_{E}$ =0.01; (b) The model achieves its optimal results when the dropout rate is set to 0.2.

**Fig 4. The execution time of SuperEdgeGO and other baseline methods on the (a) MF-GO terms, (b) BP-GO terms, and (c) CC-GO terms of the *Human* dataset.**
Note that the evaluation was conducted based on NVIDIA GeForce RTX 4090 and may vary depending on the experimental settings.

**Fig 5. Four strategies to generate the supervised attention score φij.**

See this image and copyright information in PMC

References

1. Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and comparative analysis of methods and advancements in predicting protein complex structure. Interdiscip Sci. 2024;16(2):261–88. doi: 10.1007/s12539-024-00626-x - DOI - PubMed
1. Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, et al. InterPro in 2019 : improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 2019;47(D1):D351–60. doi: 10.1093/nar/gky1100 - DOI - PMC - PubMed
1. Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, et al. CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 2017;45(D1):D289–95. doi: 10.1093/nar/gkw1098 - DOI - PMC - PubMed
1. Jones DT, Cozzetto D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics. 2015;31(6):857–63. - PMC - PubMed
1. Gligorijević V, Renfrew PD, Kosciolek T, Leman JK, Berenberg D, Vatanen T, et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun. 2021;12(1):3168. doi: 10.1038/s41467-021-23303-9 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SuperEdgeGO: Edge-supervised graph representation learning for enhanced protein function prediction

Affiliations

SuperEdgeGO: Edge-supervised graph representation learning for enhanced protein function prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources