PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings
- PMID: 39804673
- PMCID: PMC11783280
- DOI: 10.1093/bioinformatics/btaf016
PHIStruct: improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings
Abstract
Motivation: Recent computational approaches for predicting phage-host interaction have explored the use of sequence-only protein language models to produce embeddings of phage proteins without manual feature engineering. However, these embeddings do not directly capture protein structure information and structure-informed signals related to host specificity.
Results: We present PHIStruct, a multilayer perceptron that takes in structure-aware embeddings of receptor-binding proteins, generated via the structure-aware protein language model SaProt, and then predicts the host from among the ESKAPEE genera. Compared against recent tools, PHIStruct exhibits the best balance of precision and recall, with the highest and most stable F1 score across a wide range of confidence thresholds and sequence similarity settings. The margin in performance is most pronounced when the sequence similarity between the training and test sets drops below 40%, wherein, at a relatively high-confidence threshold of above 50%, PHIStruct presents a 7%-9% increase in class-averaged F1 over machine learning tools that do not directly incorporate structure information, as well as a 5%-6% increase over BLASTp.
Availability and implementation: The data and source code for our experiments and analyses are available at https://github.com/bioinfodlsu/PHIStruct.
© The Author(s) 2025. Published by Oxford University Press.
Figures









Similar articles
-
Protein embeddings improve phage-host interaction prediction.PLoS One. 2023 Jul 24;18(7):e0289030. doi: 10.1371/journal.pone.0289030. eCollection 2023. PLoS One. 2023. PMID: 37486915 Free PMC article.
-
CaLMPhosKAN: prediction of general phosphorylation sites in proteins via fusion of codon aware embeddings with amino acid aware embeddings and wavelet-based Kolmogorov-Arnold network.Bioinformatics. 2025 Mar 29;41(4):btaf124. doi: 10.1093/bioinformatics/btaf124. Bioinformatics. 2025. PMID: 40116777 Free PMC article.
-
Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation.J Comput Aided Mol Des. 2020 Oct;34(10):1105-1116. doi: 10.1007/s10822-020-00323-z. Epub 2020 Jun 16. J Comput Aided Mol Des. 2020. PMID: 32557165
-
Learned protein embeddings for machine learning.Bioinformatics. 2018 Aug 1;34(15):2642-2648. doi: 10.1093/bioinformatics/bty178. Bioinformatics. 2018. PMID: 29584811 Free PMC article.
-
Global overview and major challenges of host prediction methods for uncultivated phages.Curr Opin Virol. 2021 Aug;49:117-126. doi: 10.1016/j.coviro.2021.05.003. Epub 2021 Jun 12. Curr Opin Virol. 2021. PMID: 34126465 Review.
Cited by
-
Microbial Technologies Enhanced by Artificial Intelligence for Healthcare Applications.Microb Biotechnol. 2025 Mar;18(3):e70131. doi: 10.1111/1751-7915.70131. Microb Biotechnol. 2025. PMID: 40100535 Free PMC article. Review.
References
-
- Antimicrobial resistance surveillance in Europe 2023 - 2021 data. Stockholm: European Centre for Disease Prevention and Control and World Health Organization, 2023.
-
- Badam S, Rao S. Harnessing genome representation learning for decoding phage–host interactions. bioRxiv, 2024, preprint: not peer reviewed.
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases