Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 26;17(1):679.
doi: 10.1186/s12864-016-3018-2.

Software-based analysis of bacteriophage genomes, physical ends, and packaging strategies

Affiliations

Software-based analysis of bacteriophage genomes, physical ends, and packaging strategies

Bryan D Merrill et al. BMC Genomics. .

Abstract

Background: Phage genome analysis is a rapidly growing field. Recurrent obstacles include software access and usability, as well as genome sequences that vary in sequence orientation and/or start position. Here we describe modifications to the phage comparative genomics software program, Phamerator, provide public access to the code, and include instructions for creating custom Phamerator databases. We further report genomic analysis techniques to determine phage packaging strategies and identification of the physical ends of phage genomes.

Results: The original Phamerator code can be successfully modified and custom databases can be generated using the instructions we provide. Results of genome map comparisons within a custom database reveal obstacles in performing the comparisons if a published genome has an incorrect complementarity or an incorrect location of the first base of the genome, which are common issues in GenBank-downloaded sequence files. To address these issues, we review phage packaging strategies and provide results that demonstrate identification of the genome start location and orientation using raw sequencing data and software programs such as PAUSE and Consed to establish the location of the physical ends of the genome. These results include determination of exact direct terminal repeats (DTRs) or cohesive ends, or whether phages may use a headful packaging strategy. Phylogenetic analysis using ClustalO and phamily circles in Phamerator demonstrate that the large terminase gene can be used to identify the phage packaging strategy and thereby aide in identifying the physical ends of the genome.

Conclusions: Using available online code, the Phamerator program can be customized and utilized to generate databases with individually selected genomes. These databases can then provide fruitful information in the comparative analysis of phages. Researchers can identify packaging strategies and physical ends of phage genomes using raw data from high-throughput sequencing in conjunction with phylogenetic analyses of large terminase proteins and the use of custom Phamerator databases. We promote publication of phage genomes in an orientation consistent with the physical structure of the phage chromosome and provide guidance for determining this structure.

Keywords: Comparative genomics; DNA packaging; Phage; Phamerator; Phylogenetic tree; Sequencing; Terminase.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Total Caudovirales sequenced since 2000. This figure includes all complete genomes of Caudovirales sequenced and deposited in the “Nucleotide” NCBI database since 2000
Fig. 2
Fig. 2
Phamerator genome map comparison. This linear genome map includes two similar phages published in a similar orientation. Colored lines connecting the genomes indicate the level of nucleotide similarity from purple (low E-value, high percent identity) to red (high E-value, lower percent identity). Horizontal yellow bars inside gene product boxes indicate conserved domains and represent the length of that domain relative to the length of the gene. When the mouse is hovered over one of the yellow conserved domains, a popup box will appear describing that domain (e.g., tail assembly protein, indicated by a dotted outline). When the mouse is hovered over a pham label, a popup box will appear (indicated by a dotted outline) which identifies the clusters and phages that contain a protein in a pham. Using these features, researchers can quickly identify conserved domains in any protein and which other phages in the database contain a homologous protein
Fig. 3
Fig. 3
Phamily circle of pham 271, a Lambda family phage holin. This phamily circle displays the relationships of nine proteins that belong to pham 271. Conserved domains indicate these proteins are phage holins in the Lambda family. Cluster designations which reflect experimentally determined packaging strategies (see Additional file 1) are indicated inside the circle. Gene products connected by red lines are included in the pham because they have an E-value of less than 1e-50. Gene products connected by blue lines are included in the pham because they share more than 32.5 % identity
Fig. 4
Fig. 4
Excerpts of pham table exported from Phamerator. a The pham table is sorted by gene number in Bacillus phage Basilisk. Conserved domains and phamily members are identified for each gene. b Excerpts displaying only genes found in T3 and T7 (the T3/T7 conserved core genome). c A pham table filtered for conserved domains containing the word “terminase”. All phams containing gene products that are terminases are displayed
Fig. 5
Fig. 5
Linear genome map of three circularly permuted phages from the E4 cluster, which package chromosomes via the headful strategy. a Only Sf6 is arranged correctly. The large terminase protein is outlined in orange. Relative to phage SF6, APSE-1 and CUS-3 are arranged incorrectly and CUS-3 is also reverse-complemented. Lines connecting CUS-3 and SF6 indicate nucleotide homology. b Using DNA Master, APSE-1 and CUS-3 were rearranged and reversed complemented and these new files were reanalyzed using Phamerator for comparison. Original gene numbers were preserved
Fig. 6
Fig. 6
Neighbor-joining tree of large terminase proteins. This tree was generated by ClustalX [13], displayed in Mega6 [40], and contains large terminase sequences from phages with experimentally determined packaging mechanisms and physical ends (see Additional file 1). Bootstrap values are for 1000 trials. The scale bar shows 0.1 amino acid substitutions per site. We manually assigned clusters in Phamerator that correspond to packaging strategies. For example, phages that use 3’ cos ends (HK97) are assigned to cluster A1. This phylogenetic tree indicates that large terminase proteins sharing phamilies and packaging strategies also clade together
Fig. 7
Fig. 7
Physical structure, circularization, and packaging mechanism of a phage with exact direct terminal repeats (DTR) at each end. a The DNA inside the phage virion before infection has the same sequence at both ends. These ends are identical in each virion. b After infection, the ends undergo homologous recombination to form a circular DNA molecule. c A linear concatemer is generated via rolling circle replication. The repeated ends are duplicated while the DNA is being packaged. Each virion has identical repeats at each end
Fig. 8
Fig. 8
Analysis of exact DTRs in Bacillus phage Basilisk. a PAUSE analysis graphs the number of reads mapped to the Basilisk genome. The region between the sense and antisense starts and ends indicates the location of the short exact DTR in Bacillus phage Basilisk, which was used to call base one [6]. b Consed shows a sharp increase in coverage near the left end (sense start) of the exact DTR in Bacillus phage Basilisk. This location corresponds to the sense start which is marked by a tall read spike in Fig. 8a
Fig. 9
Fig. 9
Physical structure, circularization, and packaging mechanism of a phage with cohesive ends. a Structure of DNA inside phage virion before infection. Phages with cohesive ends can have 3’ or 5’ overhangs. b Shortly after infection, the sticky ends are ligated. The chromosome is replicated via rolling circle replication during the lytic phase. c Exactly one genome length is packaged into each phage capsid. The terminase protein cuts at the cos site, leaving 5’ or 3’ overhangs
Fig. 10
Fig. 10
Consed visualization of cos overhang sequence. Consed shows a sharp drop in coverage over the 3’ overhang in Mycobacterium phage Atkinbua
Fig. 11
Fig. 11
Consed visualization of wrap-around reads. The assembled contig for Mycobacterium phage Girly (http://phagesdb.org) contains reads that wrap around the ends of genome. The highlighted sequence to the left of the genome start a is the same as the last few base pairs at the end of the genome b
Fig. 12
Fig. 12
Physical structure, circularization, and packaging mechanism of a phage that uses headful packaging. a This figure represents the first phage chromosome packaged from a linear concatemer. The DNA inside the phage virion before infection has a similar DNA sequence at both ends. The repeat sequences at the ends of each chromosome vary from phage to phage. The bracket indicates exactly one genome-length (from one pac sequence to the next). b After infection, the ends undergo homologous recombination to form a circular DNA molecule that contains exactly one genome-length and one pac site. A linear concatemer is generated via rolling circle replication. c Beginning at the pac site, the terminase inserts the DNA into the capsid. The terminase creates imprecise cuts after slightly more than one genome length is packaged into the capsid, generating a repeated sequence at each end. Thus, the position of the pac site varies in each subsequent virion
Fig. 13
Fig. 13
Phamily circles indicate relationships of large terminase proteins. Clusters (A1-F2) were intentionally set to group phages with similar packaging strategies together. a Pham 323 contains only three large terminase proteins, indicated by bolded gp designations. The three phages that encode these terminases belong to cluster E4, which includes phages that use headful packaging (Sf6) [41, 42]. b Pham 2966 contains only three large terminases, indicated by bolded gp designations. The three phages that contain these terminases belong to cluster C3, which includes phages that have short exact DTRs (C-st). These proteins meet the cutoff parameters to be included in pham 2966, but do not meet the parameters required to draw connecting lines (see Fig. 13a). c An overlay of 15 pham circles represents large terminase proteins for every phage in the database. This circle indicates that large terminases grouped into the same pham belong to phages that use the same packaging strategy. In this database, no terminases were grouped with terminases belonging to phages that use a different packaging strategy. Gene products connected by red lines have an E-value of less than 1e-50. Gene products connected by blue lines share more than 32.5 % identity

Similar articles

Cited by

References

    1. McAuliffe O, Ross RP, Fitzgerald GF. The new phage biology: from genomics to applications. In: McGrath S, Van Sinderen D, editors. Bacteriophage: Genetics and Molecular Biology. Norfolk, England: Caister Academic Press; 2007.
    1. Jordan TC, Burnett SH, Carson S, Caruso SM, Clase K, DeJong RJ, Dennehy JJ, Denver DR, Dunbar D, Elgin SC, Findley AM, Gissendanner CR, Golebiewska UP, Guild N, Hartzog GA, Grillo WH, Hollowell GP, Hughes LE, Johnson A, King RA, Lewis LO, Li W, Rosenzweig F, Rubin MR, Saha MS, Sandoz J, Shaffer CD, Taylor B, Temple L, Vazquez E, et al. A broadly implementable research course in phage discovery and genomics for first-year undergraduate students. mBio. 2014;5(1):e01051–01013. doi: 10.1128/mBio.01051-13. - DOI - PMC - PubMed
    1. Cresawn SG, Bogel M, Day N, Jacobs-Sera D, Hendrix RW, Hatfull GF. Phamerator: a bioinformatic tool for comparative bacteriophage genomics. BMC Bioinformatics. 2011;12:395. doi: 10.1186/1471-2105-12-395. - DOI - PMC - PubMed
    1. Jacobs-Sera D, Marinelli LJ, Bowman C, Broussard GW, Guerrero Bustamante C, Boyle MM, Petrova ZO, Dedrick RM, Pope WH, SEA-PHAGES Program A. Modlin RL, Hendrix RW, Hatfull GF. On the nature of mycobacteriophage diversity and host preference. Virology. 2012;434(2):187–201. doi: 10.1016/j.virol.2012.09.026. - DOI - PMC - PubMed
    1. Lorenz L, Lins B, Barrett J, Montgomery A, Trapani S, Schindler A, Christie GE, Cresawn SG, Temple L. Genomic characterization of six novel Bacillus pumilus bacteriophages. Virology. 2013;444(1–2):374–383. doi: 10.1016/j.virol.2013.07.004. - DOI - PubMed

Publication types