Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 1;21(8):e1013343.
doi: 10.1371/journal.pcbi.1013343. eCollection 2025 Aug.

SuperEdgeGO: Edge-supervised graph representation learning for enhanced protein function prediction

Affiliations

SuperEdgeGO: Edge-supervised graph representation learning for enhanced protein function prediction

Shugang Zhang et al. PLoS Comput Biol. .

Abstract

Understanding the functions of proteins is of great importance for deciphering the mechanisms of life activities. To date, there have been over 200 million known proteins, but only 0.2% of them have well-annotated functional terms. By measuring the contacts among residues, proteins can be described as graphs so that the graph leaning approaches can be applied to learn protein representations. However, existing graph-based methods put efforts in enriching the residue node information and did not fully exploit the edge information, which leads to suboptimal representations considering the strong association of residue contacts to protein structures and to the functions. In this article, we propose SuperEdgeGO, which introduces the supervision of edges in protein graphs to learn a better graph representation for protein function prediction. Different from common graph convolution methods that uses edge information in a plain or unsupervised way, we introduce a supervised attention to encode the residue contacts explicitly into the protein representation. Comprehensive experiments demonstrate that SuperEdgeGO achieves state-of-the-art performance on all three categories of protein functions. Additional ablation analysis further proves the effectiveness of the devised edge supervision strategy. The implementation of edge supervision in SuperEdgeGO resulted in enhanced graph representations for protein function prediction, as demonstrated by its superior performance across all the evaluated categories. This superior performance was confirmed through ablation analysis, which validated the effectiveness of the edge supervision strategy. This strategy has a broad application prospect in the study of protein function and related fields.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The overall architecture of SuperEdgeGO.
Stage I. The input protein sequence is first sent to the protein language model ESM-2 to generate the feature matrix, and to the protein structure model AlphaFold2 to predict structures, which is eventually processed as the adjacency matrix. Stage II. The two matrices are fed into the model that consists of three graph attention layers, a pooling layer, and a fully-connected classifier. Particularly, the graph attention layer contains both unsupervised and supervised attention modules. Stage III. The model is optimized via minimizing two losses, namely the main task loss V arising from the wrong prediction of GO terms, and the self-supervised loss e coming from the deviation of attention scores from the binary label indicating the presence of edges.
Fig 2
Fig 2. Pn-Pe three-dimensional diagram.
Fig 3
Fig 3. Model performance of different hyperparameter settings.
(a) The model achieves its optimal results when λE=0.01; (b) The model achieves its optimal results when the dropout rate is set to 0.2.
Fig 4
Fig 4. The execution time of SuperEdgeGO and other baseline methods on the (a) MF-GO terms, (b) BP-GO terms, and (c) CC-GO terms of the Human dataset.
Note that the evaluation was conducted based on NVIDIA GeForce RTX 4090 and may vary depending on the experimental settings.
Fig 5
Fig 5. Four strategies to generate the supervised attention score φij.

References

    1. Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and comparative analysis of methods and advancements in predicting protein complex structure. Interdiscip Sci. 2024;16(2):261–88. doi: 10.1007/s12539-024-00626-x - DOI - PubMed
    1. Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, et al. InterPro in 2019 : improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res. 2019;47(D1):D351–60. doi: 10.1093/nar/gky1100 - DOI - PMC - PubMed
    1. Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, et al. CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 2017;45(D1):D289–95. doi: 10.1093/nar/gkw1098 - DOI - PMC - PubMed
    1. Jones DT, Cozzetto D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics. 2015;31(6):857–63. - PMC - PubMed
    1. Gligorijević V, Renfrew PD, Kosciolek T, Leman JK, Berenberg D, Vatanen T, et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun. 2021;12(1):3168. doi: 10.1038/s41467-021-23303-9 - DOI - PMC - PubMed

LinkOut - more resources