Deep representation learning of protein-protein interaction networks for enhanced pattern discovery

Rui Yan¹, Md Tauhidul Islam², Lei Xing^{1

2

3}

Affiliations

¹ Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA.
² Department of Radiation Oncology, Stanford University, Stanford, CA 94305, USA.
³ Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

PMID: 39693438
PMCID: PMC11654695
DOI: 10.1126/sciadv.adq4324

Deep representation learning of protein-protein interaction networks for enhanced pattern discovery

Rui Yan et al. Sci Adv. 2024.

. 2024 Dec 20;10(51):eadq4324.

doi: 10.1126/sciadv.adq4324. Epub 2024 Dec 18.

Authors

Rui Yan¹, Md Tauhidul Islam², Lei Xing^{1

2

3}

Affiliations

¹ Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA.
² Department of Radiation Oncology, Stanford University, Stanford, CA 94305, USA.
³ Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.

PMID: 39693438
PMCID: PMC11654695
DOI: 10.1126/sciadv.adq4324

Abstract

Protein-protein interaction (PPI) networks, where nodes represent proteins and edges depict myriad interactions among them, are fundamental to understanding the dynamics within biological systems. Despite their pivotal role in modern biology, reliably discerning patterns from these intertwined networks remains a substantial challenge. The essence of the challenge lies in holistically characterizing the relationships of each node with others in the network and effectively using this information for accurate pattern discovery. In this work, we introduce a self-supervised network embedding framework termed discriminative network embedding (DNE). Unlike conventional methods that primarily focus on direct or limited-order node proximity, DNE characterizes a node both locally and globally by harnessing the contrast between representations from neighboring and distant nodes. Our experimental results demonstrate DNE's superior performance over existing techniques across various critical network analyses, including PPI inference and the identification of protein functional modules. DNE emerges as a robust strategy for node representation in PPI networks, offering promising avenues for diverse biomedical applications.

PubMed Disclaimer

Figures

**Fig. 1.. Overview of DNE.**
(A) DNE comprises three main steps: (i) initializing nodes using Laplacian eigenvectors (LEs) of the network’s adjacency matrix, optionally concatenated with node features when available; (ii) identifying node neighbors as positive nodes via stochastic neighbors selection and selecting nodes from other network regions as negative nodes, based on the distribution of node degrees; and (iii) embedding each node through a deep learning encoder, optimizing the encoder’s parameters to ensure the node embeddings preserve discrimination between neighboring and nonlocal nodes. (B) Utilization of the pretrained encoder to generate node representations for versatile downstream analysis tasks.

**Fig. 2.. Performance of different methods for link prediction across four PPI benchmarks.**
(A) ROC and (B) PR curves of DNE compared with 11 other network embedding methods for PPI prediction on the *A. thaliana* dataset. Dashed lines represent level curves for accuracy and F1 score in (A) and (B), respectively. (C) Comparison of DNE with network embedding methods in four PPI benchmarks, presenting mean and SDs of ROC-AUC scores from 10 independent runs. (D) Comparison of DNE with similarity-based link prediction methods in four PPI benchmarks, presenting ROC-AUC scores from 10 runs. The central line within the box denotes the mean, the box edges represent the first and third quartiles, and the whiskers extend to ±1.5 times the interquartile range.

**Fig. 3.. Performance of different network embedding methods for module identification.**
(A) AMI scores computed from 10 independent runs by using annotated complexes from IntAct, KEGG, and GOBP as reference standards. Mean values are reported, and error bars represent the SDs of the scores. (B) Comparison of per-module Jaccard scores between DNE and six representative baselines. Each point represents a protein complex. The x axis and y axis represent the per-module overlap (Jaccard) scores obtained by the specified baseline method and DNE, respectively. A score of 0 indicates that no members in the complex were captured, and 1 indicates that all members in the complex were captured. The color and size of each point indicate the difference in Jaccard scores between DNE and other baseline methods for the corresponding complex.

**Fig. 4.. Evaluation of the overlap between the predicted complex and the standard Retromer complex.**
The Retromer complex, as annotated by IntAct, serves as a benchmark to assess the performance of various methods in module identification. This standard complex consists of five members: PEP8, VPS35, VPS29, VPS17, and VPS5. The degree of overlap between the predicted complexes and the standard complex is measured using the Jaccard index. Purple indicates that the predicted member is part of the standard complex, gray indicates that the predicted member is not part of the standard complex, and green denotes that a member from the standard complex has not been captured by the prediction.

**Fig. 5.. Performance comparison of various methods in link prediction incorporating protein features.**
(A) Integrating protein features from PLMs as node features in PPI networks for network embedding learning. (B) ROC-AUC scores for DNE and other baseline methods on the *Saccharomyces cerevisiae* dataset, derived from 10 independent runs. The purple dashed line (ESM only) indicates scenarios using only protein features extracted from ESM-2. Gray boxes indicate cases considering only network structures, while red boxes depict cases incorporating both network structures and node features.

See this image and copyright information in PMC

References

1. Barabasi A.-L., Oltvai Z. N., Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004). - PubMed
1. Alon U., Biological networks: The tinkerer as an engineer. Science 301, 1866–1867 (2003). - PubMed
1. Camacho D. M., Collins K. M., Powers R. K., Costello J. C., Collins J. J., Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018). - PubMed
1. Barabási A.-L., Gulbahce N., Loscalzo J., Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011). - PMC - PubMed
1. Bonneau R., Learning biological networks: From modules to dynamics. Nat. Chem. Biol. 4, 658–664 (2008). - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep representation learning of protein-protein interaction networks for enhanced pattern discovery

Affiliations

Deep representation learning of protein-protein interaction networks for enhanced pattern discovery

Authors

Affiliations

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources