Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 4;18(1):350.
doi: 10.1186/s12864-017-3744-0.

Predicting genome terminus sequences of Bacillus cereus-group bacteriophage using next generation sequencing data

Affiliations

Predicting genome terminus sequences of Bacillus cereus-group bacteriophage using next generation sequencing data

Cheng-Han Chung et al. BMC Genomics. .

Abstract

Background: Most tailed bacteriophages (phages) feature linear dsDNA genomes. Characterizing novel phages requires an understanding of complete genome sequences, including the definition of genome physical ends.

Result: We sequenced 48 Bacillus cereus phage isolates and analyzed Next-generation sequencing (NGS) data to resolve the genome configuration of these novel phages. Most assembled contigs featured reads that mapped to both contig ends and formed circularized contigs. Independent assemblies of 31 nearly identical I48-like Bacillus phage isolates allowed us to observe that the assembly programs tended to produce random cleavage on circularized contigs. However, currently available assemblers were not capable of reporting the underlying phage genome configuration from sequence data. To identify the genome configuration of sequenced phage in silico, a terminus prediction method was developed by means of 'neighboring coverage ratios' and 'read edge frequencies' from read alignment files. Termini were confirmed by primer walking and supported by phylogenetic inference of large DNA terminase protein sequences.

Conclusions: The Terminus package using phage NGS data along with the contig circularity could efficiently identify the proximal positions of phage genome terminus. Complete phage genome sequences allow a proposed characterization of the potential packaging mechanisms and more precise genome annotation.

Keywords: Bacteriophage; Direct terminal repeat; Genome packaging mechanisms; Neighboring coverage ratio; Phage genome configuration; Read edge frequency; Terminus prediction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Illustration of two major characteristics of phage genome sequencing used for terminus prediction: Neighboring coverage ratio (NCR) and read edge frequency. I12 phage was used as an example of the selection process of the NCRs that are considered as potential boundaries of terminal repeats. Each dot represents the logarithmic transformed NCR on given nucleotide position with 100-nucleotide window size. Two horizontal dashed lines show the threshold of 1.8 NCR and reciprocal of 1.8. NCRs that are greater than 1.8 or less than reciprocal of 1.8 are collected in a subset of hits (green dots). Within the subset, hits with at least one window coverage of given NCR is 1.8 times greater than genome coverage are considered as significant hits (blue dots). Finally, the local highest and local lowest of significant hits are considered as potential boundaries of terminal repeats (red dots). a The whole-contig NCR of I12 isolate. b The NCR between nucleotide position 68,500 and 72,000. c Every mapped read has one corresponding coordinate at its 5′ end (5′ read edge position) and one at 3′ end (3′ read edge position). The counts of every read edge position were used as one of the indicators of terminus prediction
Fig. 2
Fig. 2
The map of coverage distribution, neighboring coverage ratio (NCR) and read edge frequencies of phage isolate I13. a An illustration of hypothetical genome configuration of I13 with terminal repeats. Filled squares indicate the direct terminal repeat of phage genome. b Coverage distribution over I13 sequence contig. The lower dashed line represents the average coverage of I13 sequencing reads. The upper dashed line represents the level of 1.8 times of average coverage. c Neighboring coverage ratio (NCR) over I13 sequence contig with window size = 100 bp. The dashed lines indicate the cut-off of 1.8 and reciprocal of 1.8 of NCR after base-2 logarithmic transformation [−0.848, 0.848]. d 5′ or 3′ read edge frequencies from I13 sequencing reads. Filled black squares indicate the frequencies of 5′ read edge positions. Blank triangles indicate the frequencies of 3′ read edge position
Fig. 3
Fig. 3
Maximum Likelihood phylogeny of large terminase amino acid sequences. The alignment of protein sequences was generated by ClustalW2 [65]. The phylogeny was reconstructed using Maximum Likelihood method based on the Poisson correction model. Numbers next to internal nodes indicate the bootstrap value divided by trials size of 1000. Names of phages were illustrated at the tip of the phylogeny. The root of the phylogeny was arbitrarily chosen for visualization purpose. Arrows: three novel Bacillus phages including SBP8a, I48 and Q8. *, +, &: nine phages with suggested types of genome terminus

References

    1. Brussow H, Hendrix RW. Phage genomics: small is beautiful. Cell. 2002;108(1):13–16. doi: 10.1016/S0092-8674(01)00637-7. - DOI - PubMed
    1. Wommack KE, Colwell RR. Virioplankton: viruses in aquatic ecosystems. Microbiol Mol Biol Rev. 2000;64(1):69–114. doi: 10.1128/MMBR.64.1.69-114.2000. - DOI - PMC - PubMed
    1. Casjens SR, Gilcrease EB. Determining DNA packaging strategy by analysis of the termini of the chromosomes in tailed-bacteriophage virions. Methods Mol Biol. 2009;502:91–111. doi: 10.1007/978-1-60327-565-1_7. - DOI - PMC - PubMed
    1. Fujisawa H, Morita M. Phage DNA packaging. Genes Cells. 1997;2(9):537–545. doi: 10.1046/j.1365-2443.1997.1450343.x. - DOI - PubMed
    1. Casjens SR. The DNA-packaging nanomotor of tailed bacteriophages. Nat Rev Microbiol. 2011;9(9):647–657. doi: 10.1038/nrmicro2632. - DOI - PubMed

Publication types

LinkOut - more resources