Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Dec 16:13:1032186.
doi: 10.3389/fmicb.2022.1032186. eCollection 2022.

Phage family classification under Caudoviricetes: A review of current tools using the latest ICTV classification framework

Affiliations
Review

Phage family classification under Caudoviricetes: A review of current tools using the latest ICTV classification framework

Yilin Zhu et al. Front Microbiol. .

Abstract

Bacteriophages, which are viruses infecting bacteria, are the most ubiquitous and diverse entities in the biosphere. There is accumulating evidence revealing their important roles in shaping the structure of various microbiomes. Thanks to (viral) metagenomic sequencing, a large number of new bacteriophages have been discovered. However, lacking a standard and automatic virus classification pipeline, the taxonomic characterization of new viruses seriously lag behind the sequencing efforts. In particular, according to the latest version of ICTV, several large phage families in the previous classification system are removed. Therefore, a comprehensive review and comparison of taxonomic classification tools under the new standard are needed to establish the state-of-the-art. In this work, we retrained and tested four recently published tools on newly labeled databases. We demonstrated their utilities and tested them on multiple datasets, including the RefSeq, short contigs, simulated metagenomic datasets, and low-similarity datasets. This study provides a comprehensive review of phage family classification in different scenarios and a practical guidance for choosing appropriate taxonomic classification pipelines. To our best knowledge, this is the first review conducted under the new ICTV classification framework. The results show that the new family classification framework overall leads to better conserved groups and thus makes family-level classification more feasible.

Keywords: Caudoviricetes; bacteriophage; review of tools; taxonomic classification tools; viral metagenomic data.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The classification result of Guelinviridae sequences in tools that are retrained by removing all Guelinviridae sequences. “independent clustered”: The sequences are in a VC cluster without any reference genome.
Figure 2
Figure 2
The classification result of Rountreeviridae sequences in tools that are retrained by removing all Rountreeviridae sequences. “independent clustered”: The sequences are in a VC cluster without any reference genome.
Figure 3
Figure 3
The classification result of 2,445 unclassified sequences. “independent clustered”: The sequences are in a VC cluster without any reference genome.
Figure 4
Figure 4
The performance of each tool on contigs from the RefSeq. (A) The prediction rate of four tools on different lengths. (B) The accuracy of four tools on phage contigs with predictions. (C) The accuracy of four tools on all input phage contigs. X-axis: The lengths Y-axis: The values.
Figure 5
Figure 5
(A) The prediction rate of four tools with reduced reference datasets. (B) The corresponding accuracy on sequences with predictions. X-axis: The tools and training data partitions Y-axis: The values.
Figure 6
Figure 6
(A) The performance of the four tools on the simulated metagenomic dataset. The bars show the accuracy on all inputs. The top part with patterns in vConTACT 2.0 shows the percentage of contigs that are not clustered with any reference genome. (B) The performance of each tool on the two low-similarity datasets. Each bar shows the tools' accuracy on all input contigs.

References

    1. Ackermann H.-W. (2006). Classification of bacteriophages. The Bacteriophages 2, 8–16. 10.1002/9780470015902.a0000782.pub2 - DOI - PubMed
    1. Adams M. J., Lefkowitz E. J., King A. M., Harrach B., Harrison R. L., Knowles N. J., et al. . (2017). 50 years of the International Committee on Taxonomy of Viruses: progress and prospects. Arch. Virol. 162, 1441–1446. 10.1007/s00705-016-3215-y - DOI - PubMed
    1. Aiewsakun P., Simmonds P. (2018). The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification. Microbiome 6, 1–24. 10.1186/s40168-018-0422-7 - DOI - PMC - PubMed
    1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. . (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Auslander N., Gussow A. B., Benler S., Wolf Y. I., Koonin E. V. (2020). Seeker: alignment-free identification of bacteriophage genomes by deep learning. Nucleic Acids Res. 48, e121. 10.1093/nar/gkaa856 - DOI - PMC - PubMed

LinkOut - more resources