Issues in performance evaluation for host-pathogen protein interaction prediction
- PMID: 26932275
- DOI: 10.1142/S0219720016500116
Issues in performance evaluation for host-pathogen protein interaction prediction
Abstract
The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein-protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host-pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.
Keywords: Performance evaluation; cross-validation; host–pathogen interactions; machine learning; protein–protein interactions.
Similar articles
-
Training host-pathogen protein-protein interaction predictors.J Bioinform Comput Biol. 2018 Aug;16(4):1850014. doi: 10.1142/S0219720018500142. Epub 2018 May 29. J Bioinform Comput Biol. 2018. PMID: 30060698
-
Critical assessment and performance improvement of plant-pathogen protein-protein interaction prediction methods.Brief Bioinform. 2019 Jan 18;20(1):274-287. doi: 10.1093/bib/bbx123. Brief Bioinform. 2019. PMID: 29028906
-
Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions.Brief Bioinform. 2021 May 20;22(3):bbaa068. doi: 10.1093/bib/bbaa068. Brief Bioinform. 2021. PMID: 32459334
-
Predicting host-pathogen interactions with machine learning algorithms: A scoping review.Infect Genet Evol. 2025 Jun;130:105751. doi: 10.1016/j.meegid.2025.105751. Epub 2025 Apr 10. Infect Genet Evol. 2025. PMID: 40220943
-
Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches.Curr Drug Metab. 2019;20(3):177-184. doi: 10.2174/1389200219666180829121038. Curr Drug Metab. 2019. PMID: 30156155 Review.
Cited by
-
Machine learning methods for protein-protein binding affinity prediction in protein design.Front Bioinform. 2022 Dec 16;2:1065703. doi: 10.3389/fbinf.2022.1065703. eCollection 2022. Front Bioinform. 2022. PMID: 36591334 Free PMC article.
-
Learned protein embeddings for machine learning.Bioinformatics. 2018 Aug 1;34(15):2642-2648. doi: 10.1093/bioinformatics/bty178. Bioinformatics. 2018. PMID: 29584811 Free PMC article.
-
ESIDE: A computationally intelligent method to identify earthworm species (E. fetida) from digital images: Application in taxonomy.PLoS One. 2021 Sep 16;16(9):e0255674. doi: 10.1371/journal.pone.0255674. eCollection 2021. PLoS One. 2021. PMID: 34529673 Free PMC article.
-
Predicting protein-binding regions in RNA using nucleotide profiles and compositions.BMC Syst Biol. 2017 Mar 14;11(Suppl 2):16. doi: 10.1186/s12918-017-0386-4. BMC Syst Biol. 2017. PMID: 28361677 Free PMC article.
-
ISLAND: in-silico proteins binding affinity prediction using sequence information.BioData Min. 2020 Nov 25;13(1):20. doi: 10.1186/s13040-020-00231-w. BioData Min. 2020. PMID: 33292419 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous