Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 1;26(6):bbaf656.
doi: 10.1093/bib/bbaf656.

PBIP: a deep learning framework for predicting phage-bacterium interactions at the strain level

Affiliations

PBIP: a deep learning framework for predicting phage-bacterium interactions at the strain level

Lijia Ma et al. Brief Bioinform. .

Abstract

Phage therapy has received great attention as a promising antimicrobial treatment, and its core technique, namely predicting phage-bacterium interactions (PBIs), is crucial for understanding infection mechanisms and optimizing therapeutic strategies. However, existing computational methods mainly focus on the species or higher taxonomic levels, and usually neglect the potential of deep embedding representations, limiting their ability to capture complex biological patterns inherent in sequences. This hinders the discovery of rich sequence features, and restricts the clinical application of phage therapy. To address these limitations, we propose a novel deep learning framework (called PBIP) for strain-level PBI prediction. In PBIP, we first identify strain-level interactions through biological infection experiments and sequencing of Klebsiella pneumoniae isolated from the clinical environment of Xiangya Hospital. Then, we utilize a pretrained unified representation model to convert protein sequences of phages and bacteria into deep embeddings. Next, we apply the synthetic minority oversampling technique to generate positive interactions in the embedding space to address the data imbalance issue. Subsequently, we design a deep neural network that uses a convolutional neural network to extract local features, a bi-directional gated recurrent unit to capture global features, and an attention module to highlight significant features. Finally, a fully connected layer integrates this information for PBI prediction. Experimental results show the superiority of PBIP over the state-of-the-art methods in predicting PBIs. The code and datasets are available at https://github.com/a1678019300/PBIP.

Keywords: attention mechanism; deep learning; phage–bacterium interactions; protein representation learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The strain-level interaction dataset, showing (A) the phages and bacteria in the training and test sets, respectively, and (B) the similarity between phages in the training and test sets.
Figure 2
Figure 2
The species-level interaction dataset, showing (A) the phages and bacteria in the training and test sets, respectively, and (B) the similarity between phages in the training and test sets.
Figure 3
Figure 3
Overview of PBIP framework for predicting strain-level PBIs. (A) Collecting used strain-level data. (B) Generating protein sequence embeddings using deep representation learning. (C) Utilizing SMOTE to alleviate data imbalance in the embedding space. (D) Developing a deep learning model for predicting PBIs.
Figure 4
Figure 4
Overview of generating a protein sequence embedding using UniRep.
Figure 5
Figure 5
Overview of generating a synthetic sample using SMOTE in the embedding space.
Figure 6
Figure 6
Overview of the proposed deep learning model architecture.
Figure 7
Figure 7
The architecture of the GRU cell.
Figure 8
Figure 8
Performance comparison between PBIP and baseline methods based on ROC and PR curves on strain-level and species-level test sets: (A) strain-level ROC, (B) strain-level PR, (C) species-level ROC, and (D) species-level PR.
Figure 9
Figure 9
Performance comparison between PBIP and baseline methods at different test set imbalance ratios on strain-level dataset.
Figure 10
Figure 10
Performance comparison between PBIP and baseline methods at different similarity intervals on (A) strain-level and (B) species-level datasets.

References

    1. Kortright KE, Chan BK, Koff JL. et al. Phage therapy: a renewed approach to combat antibiotic-resistant bacteria. Cell Host Microbe 2019;25:219–32. 10.1016/j.chom.2019.01.014 - DOI - PubMed
    1. Pan J, You Z, You W. et al. PTBGRP: predicting phage–bacteria interactions with graph representation learning on microbial heterogeneous information network. Brief Bioinform 2023;24:bbad328. - PubMed
    1. Mallawaarachchi V, Roach MJ, Decewicz P. et al. Phables: from fragmented assemblies to high-quality bacteriophage genomes. Bioinformatics 2023;39:btad586. - PMC - PubMed
    1. Ma L, Deng W, Bai Y. et al. Identifying phage sequences from metagenomic data using deep neural network with word embedding and attention mechanism. IEEE/ACM Trans Comput Biol Bioinform 2023;20:3772–85. 10.1109/TCBB.2023.3322870 - DOI - PubMed
    1. Wang C, Zhang J, Cheng L. et al. DPProm: a two-layer predictor for identifying promoters and their types on phage genome using deep learning. IEEE J Biomed Health Inform 2022;26:5258–66. 10.1109/JBHI.2022.3193224 - DOI - PubMed