Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 18;11(16):eadt6432.
doi: 10.1126/sciadv.adt6432. Epub 2025 Apr 18.

Engineering bacteriophages through deep mining of metagenomic motifs

Affiliations

Engineering bacteriophages through deep mining of metagenomic motifs

Phil Huss et al. Sci Adv. .

Abstract

Bacteriophages can adapt to new hosts by altering sequence motifs through recombination or convergent evolution. Where these motifs exist and what fitness advantage they confer remains largely unknown. We report a new method, Metagenomic Sequence Informed Functional Scoring (Meta-SIFT), to find sequence motifs in metagenomic datasets to engineer phage activity. Meta-SIFT uses experimental deep mutational scanning data to create sequence profiles to mine metagenomes for functional motifs invisible to other searches. We experimentally tested ~17,000 Meta-SIFT-derived sequence motifs in the receptor binding protein of the T7 phage. The screen revealed thousands of T7 variants with novel host specificity with motifs sourced from distant families. Position, substitution, and location preferences dictated specificity across a panel of 20 hosts and conditions. To demonstrate therapeutic utility, we engineered active T7 variants against foodborne pathogen Escherichia coli O121. Meta-SIFT is a powerful tool to unlock the potential encoded in phage metagenomes to engineer bacteriophages.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Meta-SIFT identifies diverse motifs from metagenomic phage sequences.
(A) Illustration of Meta-SIFT. Results from DMS of the T7 phage RBP tip domain (1) are used to seed scores for probable motifs (2), which are then mined in metagenomic databases (3). Motifs passing a hit threshold are substituted into phages in a phage library using ORACLE (4). The left image shows crystal structure [Protein Data Bank (PDB): 4A0T] and secondary structure topology color coded with interior loops as red, β-sheet as yellow, and exterior loops as blue. (B) Heatmap showing seed score (red gradient) for substitutions in the tip domain. A higher seed score indicates a more important substitution for infectivity. Substitutions are shown top to bottom, while position (residue numbering based on PDB: 4A0T), wild-type amino acid (WT AA), and secondary structure topology are shown left to right. Topology coloring and relevant position are labeled as seen left in (A). (C) Number of mutations per sourced motif for 6mer (light green, top) or 10mer (dark green, bottom) motifs. (D) Number of and relative position on the RBP of each mutation for 6mer (left, light green) and 10mer (right, dark green) motifs derived from the metagenomic dataset from positions 472 to 548. Topology is shown at the bottom for reference. (E) Number of motifs found passing hit threshold using Meta-SIFT and either metagenomic phage structural proteins or nonstructural proteins (left). Number of motifs found using Meta-SIFT with metagenomic structural proteins after fully randomizing seed scores or randomizing seed scores at each position (right, means ± SD of triplicate randomizations). 6mer motifs are shown in light green, and 10mer motifs are shown in dark green. (F) Number of evaluated motifs sourced from different phage families derived from proteins annotated in NCBI.
Fig. 2.
Fig. 2.. Motifs direct specificity and activity across bacterial hosts.
(A) Schematic representation of evaluated host panel. Lipopolysaccharide structure for evaluated BW25113 bacterial hosts shown with the expected effect of the gene deletion on LPS structure denoted by a dashed line. (B) Efficiency of plating (EOP) results shown for wild-type T7 phage on all evaluated hosts (top, BW25113 knockouts labeled as missing gene) and NaCl gradient for E. coli O121 (bottom), normalized to E. coli 10G with helper plasmid (biological triplicates, ±SD). The 0.75, 1, and 2% NaCl supplement reached the limit of detection for the assay with no apparent plaques. (C) Schematic representation of passage and scoring scheme. Library variants are scored by comparing variant abundance of an unbiased library pre- and post-selection on different hosts. (D) Violin plots displaying the range of log2 FN score for each evaluated host. BW25113 deletion mutants are labeled by their gene deletion, and O121 by the percentage NaCl condition. (E) Average number of mutations (±SD) in each variant for 6mer (top, light purple) and 10mer (bottom, dark purple) motifs as activity (maximum log2 FN on any host or condition) increases. (F) Number of and relative position of each mutation for 6mer (top, light purple) and 10mer (bottom, dark purple) active motifs with a log2 FN > −3 after selection. Secondary structure topology is labeled left to right and color coded with interior loops as red, β-sheet as yellow, and exterior loops as blue.
Fig. 3.
Fig. 3.. Motifs are distributed across the tip domain.
(A) FN score and position in the T7 RBP of detectable phage variants across different hosts. Each point displays a substitution for a motif, with the start of the motif shaded dark blue and the final substitution light red. (B and C) Substitution frequency plots for (B) positions 536 to 544 for variants on BW25113 with a substitution at D540 (left) or without a substitution at D540 (right) and (C) positions 516 to 521 for variants on BL21 with a substitution at D520 (left) or without a substitution at D520 (right). The wild-type (WT) sequence and topology are shown bottom, and amino acids are colored based on similarity of physicochemical properties as shown left in (B).
Fig. 4.
Fig. 4.. Hierarchical clustering reveals patterns in motif selection.
(A) Total number of phage variants (inclusive of all 6mer and 10mer motifs) and the associated number of expected susceptible hosts (FN > −3) for each variant. (B) Spearman correlation of phage library activity between each pair of hosts, shown as a gradient from red (1, perfectly correlated) to white (0, no correlation) to blue (−1, perfectly negatively correlated). (C) Hierarchical clustering of normalized functional scores (log2 FN, blue to red gradient, wild-type FN = 0) for active phage variants on 20 E. coli hosts or conditions (listed left, identified by the strain name, LPS deletion, or for O121 the relevant salt concentration). FN clustered is the average of two biological replicates, spliced into 50 clusters. (D to G) Substitution frequency plots comparing substitution frequency for (D) clusters 15 and 2 at positions 475 to 490, (E) clusters 22 and 25 at positions 475 to 490, (F) clusters 5 and 32 at positions 516 to 521, and (G) clusters 44 and 50 at positions 538 to 544. The wild-type sequence and topology are shown bottom, and amino acids are colored based on similarity of physicochemical properties as shown left in (D).
Fig. 5.
Fig. 5.. Meta-SIFT identifies active motifs on foodborne pathogen E. coli O121.
(A) FN score and position in the T7 RBP of detectable phage variants across different salt conditions (2 to 0%). Each point displays a substitution for a motif, with the start of the motif shaded dark blue and the final substitution shaded light red. (B) Substitution frequency plots at positions S475 (top left), S477 (bottom left), G479 (top right), and V482 (bottom right) in different salt conditions for variants in each condition where FN > −3. Amino acids are colored on the basis of similarity of physicochemical properties as shown in left. (C) Substitution frequency plots from positions 475 to 481 for 2% (top, variants with an FN > 3) and 0% (bottom, variants with an FN > 0). The wild-type sequence and topology are shown in the bottom, and amino acids are colored by physicochemical properties as shown in (B). (D) EOP (log10) on E. coli O121 for wild-type T7 (top) for reference, T7 with motif GGIARA from position 475 to 480 (middle), and T7 variant with motif IARTGS from positions 477 to 482 (bottom). EOP normalized to E. coli 10G a wild-type helper plasmid. Shown as the average of three replicates ±SD.

Update of

References

    1. Dedrick R. M., Guerrero-Bustamante C. A., Garlena R. A., Russell D. A., Ford K., Harris K., Gilmour K. C., Soothill J., Jacobs-Sera D., Schooley R. T., Hatfull G. F., Spencer H., Engineered bacteriophages for treatment of a patient with a disseminated drug-resistant Mycobacterium abscessus. Nat. Med. 25, 730–733 (2019). - PMC - PubMed
    1. Huss P., Meger A., Leander M., Nishikawa K., Raman S., Mapping the functional landscape of the receptor binding domain of T7 bacteriophage by deep mutational scanning. eLife 10, e63775 (2021). - PMC - PubMed
    1. Schooley R. T., Biswas B., Gill J. J., Hernandez-Morales A., Lancaster J., Lessor L., Barr J. J., Reed S. L., Rohwer F., Benler S., Segall A. M., Taplitz R., Smith D. M., Kerr K., Kumaraswamy M., Nizet V., Lin L., McCauley M. D., Strathdee S. A., Benson C. A., Pope R. K., Leroux B. M., Picel A. C., Mateczun A. J., Cilwa K. E., Regeimbal J. M., Estrella L. A., Wolfe D. M., Henry M. S., Quinones J., Salka S., Bishop-Lilly K. A., Young R., Hamilton T., Development and use of personalized bacteriophage-based therapeutic cocktails to treat a patient with a disseminated resistant Acinetobacter baumannii infection. Antimicrob. Agents Chemother. 61, e00954-17 (2017). - PMC - PubMed
    1. Abeles S. R., Pride D. T., Molecular bases and role of viruses in the human microbiome. J. Mol. Biol. 426, 3892–3906 (2014). - PMC - PubMed
    1. Benler S., Yutin N., Antipov D., Rayko M., Shmakov S., Gussow A. B., Pevzner P., Koonin E. V., Thousands of previously unknown phages discovered in whole-community human gut metagenomes. Microbiome 9, 78 (2021). - PMC - PubMed

LinkOut - more resources