Prediction of Klebsiella phage-host specificity at the strain level

Dimitri Boeckaerts^{1

2}, Michiel Stock², Celia Ferriol-González³, Jesús Oteo-Iglesias^{4

5}, Rafael Sanjuán³, Pilar Domingo-Calap³, Bernard De Baets², Yves Briers⁶

Affiliations

¹ Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium.
² KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.
³ Institute for Integrative Systems Biology (I2SysBio), Universitat de Valencia-CSIC, Paterna, Spain.
⁴ Laboratorio de Referencia e Investigación en Resistencia a Antibióticos e Infecciones Relacionadas con la Asistencia Sanitaria, Centro Nacional de Microbiología, Instituto de Salud Carlos III, Madrid, Spain.
⁵ CIBER de Enfermedades Infecciosas (CIBERINFEC), Instituto de Salud Carlos III, Madrid, Spain.
⁶ Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium. Yves.Briers@UGent.be.

PMID: 38778023
PMCID: PMC11111740
DOI: 10.1038/s41467-024-48675-6

Prediction of Klebsiella phage-host specificity at the strain level

Dimitri Boeckaerts et al. Nat Commun. 2024.

. 2024 May 22;15(1):4355.

doi: 10.1038/s41467-024-48675-6.

Authors

Dimitri Boeckaerts^{1

2}, Michiel Stock², Celia Ferriol-González³, Jesús Oteo-Iglesias^{4

5}, Rafael Sanjuán³, Pilar Domingo-Calap³, Bernard De Baets², Yves Briers⁶

Affiliations

¹ Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium.
² KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.
³ Institute for Integrative Systems Biology (I2SysBio), Universitat de Valencia-CSIC, Paterna, Spain.
⁴ Laboratorio de Referencia e Investigación en Resistencia a Antibióticos e Infecciones Relacionadas con la Asistencia Sanitaria, Centro Nacional de Microbiología, Instituto de Salud Carlos III, Madrid, Spain.
⁵ CIBER de Enfermedades Infecciosas (CIBERINFEC), Instituto de Salud Carlos III, Madrid, Spain.
⁶ Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium. Yves.Briers@UGent.be.

PMID: 38778023
PMCID: PMC11111740
DOI: 10.1038/s41467-024-48675-6

Abstract

Phages are increasingly considered promising alternatives to target drug-resistant bacterial pathogens. However, their often-narrow host range can make it challenging to find matching phages against bacteria of interest. Current computational tools do not accurately predict interactions at the strain level in a way that is relevant and properly evaluated for practical use. We present PhageHostLearn, a machine learning system that predicts strain-level interactions between receptor-binding proteins and bacterial receptors for Klebsiella phage-bacteria pairs. We evaluate this system both in silico and in the laboratory, in the clinically relevant setting of finding matching phages against bacterial strains. PhageHostLearn reaches a cross-validated ROC AUC of up to 81.8% in silico and maintains this performance in laboratory validation. Our approach provides a framework for developing and evaluating phage-host prediction methods that are useful in practice, which we believe to be a meaningful contribution to the machine-learning-guided development of phage therapeutics and diagnostics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. PhageHostLearn overview and validation procedures.**
a. Overview of the PhageHostLearn machine learning system. PhageHostLearn processes phage and bacterial genomes into phage RBPs and bacterial K-locus proteins, respectively. Phage RBPs belonging to the same phage and bacterial K-locus proteins belonging to the same bacterium are combined into separate multi-instance representations using ESM-2. These multi-instance representations are concatenated into combined representations of the phage-host pairs. Finally, these representations are given as input into an XGBoost model to make predictions and output a ranking of top candidate phages to test against a given bacterium. b. In silico validation of the PhageHostLearn system using a leave-one-group-out cross-validation (LOGOCV) scheme that measures the ROC AUC and mean hit ratio @ k as evaluation metrics. c. In vitro validation of the PhageHostLearn system using 28 high-risk *K. pneumoniae* clinical isolates in Spain. The PhageHostLearn system predicts a top-five ranking for each of the clinical isolates. For each ranking, the top five phage candidates are validated in the laboratory using phage spot tests.

**Fig. 2. In silico validation results of PhageHostLearn.**
a. Mean hit ratio @ k of the trained XGBoost model in a LOGOCV at decreasing thresholds for K-locus identity (blue-green curves) and of an informed microbiologist approach (red). At the 100% threshold for grouping, identical K-locus sequences are grouped together, either in the training set or test set. b. ROC curve with AUC of the trained XGBoost model in a LOGOCV at decreasing thresholds for K-locus identity. c Histogram of the mean top-10 hit ratio against the number of KL-types for which that hit ratio was achieved. There is a contrast between KL-types that are perfectly predicted (hit ratio is 100%) and not at all predicted (hit ratio is 0%). d–f Histograms of the number of confirmed interactions per bacterial strain related to the KL-types with a mean top-10 hit ratio of respectively 0%, 50–80%, and 100%.

**Fig. 3. Comparison of in silico and in vitro validation results of PhageHostLearn.**
a. Mean hit ratio @ k comparing the in silico validation and the in vitro validation of the XGBoost model. b. ROC curve with AUC comparing the in silico validation and the in vitro validation of the XGBoost model.

**Fig. 4. PhageHostLearn can guide effective laboratory validation of clinical bacterial isolates that are sequenced.**
The system produces prediction scores that are used to construct a ranking of phage candidates, which is an actionable format from which laboratory validation can be focused on the top-k ranked phages.

See this image and copyright information in PMC

References

1. Clokie MRJ, Miljard AD, Letarov AV, Heaphy S. Phages in nature. Bacteriophage. 2011;1:31–45. doi: 10.4161/bact.1.1.14942. - DOI - PMC - PubMed
1. Sørensen AN, Woudstra C, Sørensen MCH, Brøndsted L. Subtypes of tail spike proteins predicts the host range of Ackermannviridaephages. Comput Struct. Biotechnol. J. 2021;19:4854–4867. doi: 10.1016/j.csbj.2021.08.030. - DOI - PMC - PubMed
1. Beamud B, et al. Genetic determinants of host tropism in Klebsiella phages. Cell Rep. 2023;42:112048. doi: 10.1016/j.celrep.2023.112048. - DOI - PMC - PubMed
1. Schwarzer D, et al. A multivalent adsorption apparatus explains the broad host range of phage phi92: a comprehensive genomic and structural analysis. J. Virol. 2012;86:10384–10398. doi: 10.1128/JVI.00801-12. - DOI - PMC - PubMed
1. Hanson CA, Marston MF, Martiny JB. Biogeographic variation in host range phenotypes and taxonomic composition of marine cyanophage isolates. Front. Microbiol. 2016;7:983. doi: 10.3389/fmicb.2016.00983. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of Klebsiella phage-host specificity at the strain level

Affiliations

Prediction of Klebsiella phage-host specificity at the strain level

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources