Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 22;15(1):4355.
doi: 10.1038/s41467-024-48675-6.

Prediction of Klebsiella phage-host specificity at the strain level

Affiliations

Prediction of Klebsiella phage-host specificity at the strain level

Dimitri Boeckaerts et al. Nat Commun. .

Abstract

Phages are increasingly considered promising alternatives to target drug-resistant bacterial pathogens. However, their often-narrow host range can make it challenging to find matching phages against bacteria of interest. Current computational tools do not accurately predict interactions at the strain level in a way that is relevant and properly evaluated for practical use. We present PhageHostLearn, a machine learning system that predicts strain-level interactions between receptor-binding proteins and bacterial receptors for Klebsiella phage-bacteria pairs. We evaluate this system both in silico and in the laboratory, in the clinically relevant setting of finding matching phages against bacterial strains. PhageHostLearn reaches a cross-validated ROC AUC of up to 81.8% in silico and maintains this performance in laboratory validation. Our approach provides a framework for developing and evaluating phage-host prediction methods that are useful in practice, which we believe to be a meaningful contribution to the machine-learning-guided development of phage therapeutics and diagnostics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. PhageHostLearn overview and validation procedures.
a. Overview of the PhageHostLearn machine learning system. PhageHostLearn processes phage and bacterial genomes into phage RBPs and bacterial K-locus proteins, respectively. Phage RBPs belonging to the same phage and bacterial K-locus proteins belonging to the same bacterium are combined into separate multi-instance representations using ESM-2. These multi-instance representations are concatenated into combined representations of the phage-host pairs. Finally, these representations are given as input into an XGBoost model to make predictions and output a ranking of top candidate phages to test against a given bacterium. b. In silico validation of the PhageHostLearn system using a leave-one-group-out cross-validation (LOGOCV) scheme that measures the ROC AUC and mean hit ratio @ k as evaluation metrics. c. In vitro validation of the PhageHostLearn system using 28 high-risk K. pneumoniae clinical isolates in Spain. The PhageHostLearn system predicts a top-five ranking for each of the clinical isolates. For each ranking, the top five phage candidates are validated in the laboratory using phage spot tests.
Fig. 2
Fig. 2. In silico validation results of PhageHostLearn.
a. Mean hit ratio @ k of the trained XGBoost model in a LOGOCV at decreasing thresholds for K-locus identity (blue-green curves) and of an informed microbiologist approach (red). At the 100% threshold for grouping, identical K-locus sequences are grouped together, either in the training set or test set. b. ROC curve with AUC of the trained XGBoost model in a LOGOCV at decreasing thresholds for K-locus identity. c Histogram of the mean top-10 hit ratio against the number of KL-types for which that hit ratio was achieved. There is a contrast between KL-types that are perfectly predicted (hit ratio is 100%) and not at all predicted (hit ratio is 0%). df Histograms of the number of confirmed interactions per bacterial strain related to the KL-types with a mean top-10 hit ratio of respectively 0%, 50–80%, and 100%.
Fig. 3
Fig. 3. Comparison of in silico and in vitro validation results of PhageHostLearn.
a. Mean hit ratio @ k comparing the in silico validation and the in vitro validation of the XGBoost model. b. ROC curve with AUC comparing the in silico validation and the in vitro validation of the XGBoost model.
Fig. 4
Fig. 4. PhageHostLearn can guide effective laboratory validation of clinical bacterial isolates that are sequenced.
The system produces prediction scores that are used to construct a ranking of phage candidates, which is an actionable format from which laboratory validation can be focused on the top-k ranked phages.

References

    1. Clokie MRJ, Miljard AD, Letarov AV, Heaphy S. Phages in nature. Bacteriophage. 2011;1:31–45. doi: 10.4161/bact.1.1.14942. - DOI - PMC - PubMed
    1. Sørensen AN, Woudstra C, Sørensen MCH, Brøndsted L. Subtypes of tail spike proteins predicts the host range of Ackermannviridaephages. Comput Struct. Biotechnol. J. 2021;19:4854–4867. doi: 10.1016/j.csbj.2021.08.030. - DOI - PMC - PubMed
    1. Beamud B, et al. Genetic determinants of host tropism in Klebsiella phages. Cell Rep. 2023;42:112048. doi: 10.1016/j.celrep.2023.112048. - DOI - PMC - PubMed
    1. Schwarzer D, et al. A multivalent adsorption apparatus explains the broad host range of phage phi92: a comprehensive genomic and structural analysis. J. Virol. 2012;86:10384–10398. doi: 10.1128/JVI.00801-12. - DOI - PMC - PubMed
    1. Hanson CA, Marston MF, Martiny JB. Biogeographic variation in host range phenotypes and taxonomic composition of marine cyanophage isolates. Front. Microbiol. 2016;7:983. doi: 10.3389/fmicb.2016.00983. - DOI - PMC - PubMed