. 2021 Feb 25;16(2):e0244430.

doi: 10.1371/journal.pone.0244430. eCollection 2021.

PFP-WGAN: Protein function prediction by discovering Gene Ontology term correlations with generative adversarial networks

Seyyede Fatemeh Seyyedsalehi^{1

2}, Mahdieh Soleymani¹, Hamid R Rabiee¹, Mohammad R K Mofrad²

Affiliations

¹ Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
² Department of Mechanical Engineering, University of California Berkeley, Berkeley, California, United States of America.

PMID: 33630862
PMCID: PMC7906332
DOI: 10.1371/journal.pone.0244430

PFP-WGAN: Protein function prediction by discovering Gene Ontology term correlations with generative adversarial networks

Seyyede Fatemeh Seyyedsalehi et al. PLoS One. 2021.

. 2021 Feb 25;16(2):e0244430.

doi: 10.1371/journal.pone.0244430. eCollection 2021.

Authors

Seyyede Fatemeh Seyyedsalehi^{1

2}, Mahdieh Soleymani¹, Hamid R Rabiee¹, Mohammad R K Mofrad²

Affiliations

¹ Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
² Department of Mechanical Engineering, University of California Berkeley, Berkeley, California, United States of America.

PMID: 33630862
PMCID: PMC7906332
DOI: 10.1371/journal.pone.0244430

Abstract

Understanding the functionality of proteins has emerged as a critical problem in recent years due to significant roles of these macro-molecules in biological mechanisms. However, in-laboratory techniques for protein function prediction are not as efficient as methods developed and processed for protein sequencing. While more than 70 million protein sequences are available today, only the functionality of around one percent of them are known. These facts have encouraged researchers to develop computational methods to infer protein functionalities from their sequences. Gene Ontology is the most well-known database for protein functions which has a hierarchical structure, where deeper terms are more determinative and specific. However, the lack of experimentally approved annotations for these specific terms limits the performance of computational methods applied on them. In this work, we propose a method to improve protein function prediction using their sequences by deeply extracting relationships between Gene Ontology terms. To this end, we construct a conditional generative adversarial network which helps to effectively discover and incorporate term correlations in the annotation process. In addition to the baseline algorithms, we compare our method with two recently proposed deep techniques that attempt to utilize Gene Ontology term correlations. Our results confirm the superiority of the proposed method compared to the previous works. Moreover, we demonstrate how our model can effectively help to assign more specific terms to sequences.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. The proposed method for protein function prediction.**
Given the input sequence of amino acids, the generator has an embedding as its first layer which converts one-hot vectors to more compact representations. Then, one dimensional convolution filters are employed to explore meaningful sequential patterns. Then biochemical and biophysical features of the input are extracted from obtained activation maps. These features are used to predict GO annotations by the last fully connected layer. A discriminator judges about the validity of the obtained annotation for this sequence by observing pairs of protein sequences and their experimentally approved annotations in SwissProt. By observing proteins which are annotated by a common set of GO terms, discriminator could extract correlations between GO terms.

**Fig 2. Comparison of BLAST, DeepGO-Seq and PFP-WGAN on dataset 1.**
The F_max measure shows the superiority of PFP-WGAN in all three parts of the GO.

**Fig 3. Differences between average F1 obtained for PFP-WGAN and DeepGO-Seq for the GO terms in each height (F¯P and F¯D).**
In the BP branch (as the most important part of GO with a large number of terms) differences are increased when moving through the deeper terms. In the most and half parts of the charts for CC and MF branches we can observe this pattern too.

**Fig 4. F1 obtained for PFP-WGAN and DeepGO-Seq for GO terms as a function of number of available positive training samples.**
The improvement which is obtained by PFP-WGAN for rare terms is more considerable comparing to terms with large numbers of positive samples.

**Fig 5. Comparison of BLAST, FFPRED, CSSAG, STDNN, MTDNN and PFP-WGAN on dataset 2.**
The F_max measure shows the superiority of PFP-WGAN in all three parts of the GO.

**Fig 6. Sensitivity of the PFP-WGAN on parameter λ₁.**

See this image and copyright information in PMC

References

1. Roy A, Yang J, Zhang Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res. 2012; 40: 938–950. 10.1093/nar/gks372 - DOI - PMC - PubMed
1. Vladimir G, Barot M, Bonneau R. DeepNF: Deep network fusion for protein function prediction. Bioinformatics 2018; 34(22): 3873–3881. 10.1093/bioinformatics/bty440 - DOI - PMC - PubMed
1. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2014; 43(D1): D447–D452. 10.1093/nar/gku1003 - DOI - PMC - PubMed
1. Alshahrani M, Khan MA, Maddouri O, Kinjo AR, Queralt-Rosinach N, Hoehndorf R. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics. 2017; 33(17): 2723–2730. 10.1093/bioinformatics/btx275 - DOI - PMC - PubMed
1. You R, Zhang Z, Xiong Y, Sun F, Mamitsuka H, Zhu S. GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank. Bioinformatics. 2018; 34(14): 2465–2473. 10.1093/bioinformatics/bty130 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PFP-WGAN: Protein function prediction by discovering Gene Ontology term correlations with generative adversarial networks

Affiliations

PFP-WGAN: Protein function prediction by discovering Gene Ontology term correlations with generative adversarial networks

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources