MM-StackEns: A new deep multimodal stacked generalization approach for protein-protein interaction prediction

Alexandra-Ioana Albu¹, Maria-Iuliana Bocicor², Gabriela Czibula³

Affiliations

¹ Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania. Electronic address: alexandra.albu@ubbcluj.ro.
² Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania. Electronic address: maria.bocicor@ubbcluj.ro.
³ Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania. Electronic address: gabriela.czibula@ubbcluj.ro.

PMID: 36623437
DOI: 10.1016/j.compbiomed.2022.106526

Free article

MM-StackEns: A new deep multimodal stacked generalization approach for protein-protein interaction prediction

Alexandra-Ioana Albu et al. Comput Biol Med. 2023 Feb.

Free article

. 2023 Feb:153:106526.

doi: 10.1016/j.compbiomed.2022.106526. Epub 2023 Jan 3.

Authors

Alexandra-Ioana Albu¹, Maria-Iuliana Bocicor², Gabriela Czibula³

Affiliations

¹ Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania. Electronic address: alexandra.albu@ubbcluj.ro.
² Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania. Electronic address: maria.bocicor@ubbcluj.ro.
³ Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania. Electronic address: gabriela.czibula@ubbcluj.ro.

PMID: 36623437
DOI: 10.1016/j.compbiomed.2022.106526

Abstract

Accurate in-silico identification of protein-protein interactions (PPIs) is a long-standing problem in biology, with important implications in protein function prediction and drug design. Current computational approaches predominantly use a single data modality for describing protein pairs, which may not fully capture the characteristics relevant for identifying PPIs. Another limitation of existing methods is their poor generalization to proteins outside the training graph. In this paper, we aim to address these shortcomings by proposing a new ensemble approach for PPI prediction, which learns information from two modalities, corresponding to pairs of sequences and to the graph formed by the training proteins and their interactions. Our approach uses a siamese neural network to process sequence information, while graph attention networks are employed for the network view. For capturing the relationships between the proteins in a pair, we design a new feature fusion module, based on computing the distance between the distributions corresponding to the two proteins. The prediction is made using a stacked generalization procedure, in which the final classifier is represented by a Logistic Regression model trained on the scores predicted by the sequence and graph models. Additionally, we show that protein sequence embeddings obtained using pretrained language models can significantly improve the generalization of PPI methods. The experimental results demonstrate the good performance of our approach, which surpasses all the related work on two Yeast data sets, while outperforming the majority of literature approaches on two Human data sets and on independent multi-species data sets.

Keywords: Contextualized word embeddings; Feature fusion; Graph neural networks; Neural networks; Protein–protein interaction prediction.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MM-StackEns: A new deep multimodal stacked generalization approach for protein-protein interaction prediction

Affiliations

MM-StackEns: A new deep multimodal stacked generalization approach for protein-protein interaction prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Molecular Biology Databases