Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug 10;16(8):1278.
doi: 10.3390/v16081278.

A Bioinformatic Ecosystem for Bacteriophage Genomics: PhaMMSeqs, Phamerator, pdm_utils, PhagesDB, DEPhT, and PhamClust

Affiliations
Review

A Bioinformatic Ecosystem for Bacteriophage Genomics: PhaMMSeqs, Phamerator, pdm_utils, PhagesDB, DEPhT, and PhamClust

Christian H Gauthier et al. Viruses. .

Abstract

The last thirty years have seen a meteoric rise in the number of sequenced bacteriophage genomes, spurred on by both the rise and success of groups working to isolate and characterize phages, and the rapid and significant technological improvements and reduced costs associated with sequencing their genomes. Over the course of these decades, the tools used to glean evolutionary insights from these sequences have grown more complex and sophisticated, and we describe here the suite of computational and bioinformatic tools used extensively by the integrated research-education communities such as SEA-PHAGES and PHIRE, which are jointly responsible for 25% of all complete phage genomes in the RefSeq database. These tools are used to integrate and analyze phage genome data from different sources, for identification and precise extraction of prophages from bacterial genomes, computing "phamilies" of related genes, and displaying the complex nucleotide and amino acid level mosaicism of these genomes. While over 50,000 SEA-PHAGES students have primarily benefitted from these tools, they are freely available for the phage community at large.

Keywords: Actinobacteria; Actinobacteriophage; Bacteriophage; genomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
pdm_utils integrates data from remote and local resources to build and manage a PhageGenomicsDB [23]. Flow diagram depicting how pdm_utils pipelines are used to integrate, process, and output phage data in the SEA-PHAGES program. Phage data are retrieved from PhagesDB (manual annotations and genome metadata), PECAAN (draft auto-annotations of new genomes deposited at PhagesDB), and GenBank (final, published annotations) with the ‘get_data’ (D) pipeline. These data are evaluated and inserted into a MySQL relational database (PhageGenomicsDB) with the ‘import’ (I) and ‘update’ (U) pipelines. Conserved domains are identified with the ‘find_domains’ (FD) pipeline, which references a local copy of the NCBI Conserved Domain Database. Transmembrane domains are identified with the ‘find_transmembrane’ (FT) pipeline, which leverages DeepTMHMM. PhaMMseqs is used to place gene products into groups according to sequence similarity. Static copies of a PhageGenomicsDB can be created for publication and archiving with the ‘freeze’ (F) pipeline. Databases can be converted between schema versions with the ‘convert’ (C) pipeline, to ensure compatibility with downstream tools. Data can be exported in various formats using the ’export’ (E) pipeline and uploaded to a remote data server with the ‘push’ (P) pipeline. These and other databases can be retrieved from a remote data server with the ‘get_db’ (DB) pipeline. Tools such as Phamerator, Starterator, PECAAN, and PhamClust ingest and utilize the contents of these databases to perform their functions. Finally, data from PhagesDB, GenBank, and a local instance of PhageGenomicsDB can be evaluated using the ‘compare’ (CM), ‘review’ (RW) and ‘revise’ (RS) pipelines, to identify and address discrepancies between data sources.
Figure 2
Figure 2
Genome maps of phages Carcharodon and MichelleMyBell. (A) Phages Carcharodon and MichelleMyBell both belong to Cluster N and share 60% of their genes (as determined using the ‘Shared Gene Content’ function at phagesdb.org (accessed on 1 July, 2024), such that 60% of the genes in each phage are in the same phamily, as calculated using PhaMMseqs). Each genome is shown as a ruler with each kbp indicated and with markers spaced at 100 bp intervals. The predicted genes are shown as colored boxes, either above (rightwards transcribed) or below (leftwards transcribed) the genome. Gene names are shown within the boxes, and the phamily number of that gene shown is above with the number of phamily members in parentheses. Genes are colored according to their phamily, and white genes represent orphams (phams with only a single member). Pairwise DNA sequence similarity between the two genomes is displayed by spectrum-coloring shading between the genome rulers, with violet being the most similar and red the least similar above a threshold BlastN E-value of 10−4. Putative gene functions are indicated. Maps were constructed with Phamerator using the ‘Actino_Draft’ database version 565. (B) A zoomed-in view of the extreme part of the left end of the Carcharodon genome, illustrating the genome map features as described for panel (A).
Figure 3
Figure 3
Proteomic equivalence quotient (PEQ) closely approximates genome-wide BlastN nucleotide identity (gNI). Scatterplot showing the gNI (x-axis) and PEQ (y-axis) for each pair of phage genomes retrieved from RefSeq. The black dashed line shows the Y = X line, and the red dashed line shows the best-fit line for PEQ versus gNI, with number of points, best fit line equation, and Pearson R2 shown in the top left corner.
Figure 4
Figure 4
Analysis of in-group gNI at the species, genus, subfamily, and family taxonomic ranks. Histograms showing the range and distribution of intra-species (A), intra-genus (B), intra-subfamily (C), and intra-family (D) gNI values observed using ITCV groupings. For each panel, analysis was limited to only genome pairs in the same grouping at that taxonomic rank.

References

    1. Hendrix R.W. Bacteriophages: Evolution of the majority. Theor. Popul. Biol. 2002;61:471–480. doi: 10.1006/tpbi.2002.1590. - DOI - PubMed
    1. Strathdee S.A., Hatfull G.F., Mutalik V.K., Schooley R.T. Phage therapy: From biological mechanisms to future directions. Cell. 2023;186:17–31. doi: 10.1016/j.cell.2022.11.017. - DOI - PMC - PubMed
    1. Bernheim A., Sorek R. The pan-immune system of bacteria: Antiviral defence as a community resource. Nat. Rev. Microbiol. 2020;18:113–119. doi: 10.1038/s41579-019-0278-2. - DOI - PubMed
    1. Salmond G.P., Fineran P.C. A century of the phage: Past, present and future. Nat. Rev. Microbiol. 2015;13:777–786. doi: 10.1038/nrmicro3564. - DOI - PubMed
    1. Sanger F., Air G.M., Barrell B.G., Brown N.L., Coulson A.R., Fiddes C.A., Hutchison C.A., Slocombe P.M., Smith M. Nucleotide sequence of bacteriophage phi X174 DNA. Nature. 1977;265:687–695. doi: 10.1038/265687a0. - DOI - PubMed

Publication types

LinkOut - more resources