Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 31;14(1):502.
doi: 10.1038/s41467-023-35945-y.

ViralCC retrieves complete viral genomes and virus-host pairs from metagenomic Hi-C data

Affiliations

ViralCC retrieves complete viral genomes and virus-host pairs from metagenomic Hi-C data

Yuxuan Du et al. Nat Commun. .

Abstract

The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available at https://github.com/dyxstat/ViralCC .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the ViralCC pipeline.
The general workflow of ViralCC to retrieve high-quality viral genomes and determine virus-host pairs. Shotgun reads are first assembled into contigs, to which Hi-C paired-end reads are aligned. Viral contigs are subsequently identified. Leveraging Hi-C linkages and the virus-host proximity structure to link viral contigs, ViralCC constructs the Hi-C interaction graph and the host proximity graph. After integrating two graphs, ViralCC employs Leiden clustering to reconstruct draft viral genomes, and additionally detects the virus-host pairs based on recovered viral genomes and Hi-C linkages.
Fig. 2
Fig. 2. ViralCC outperforms other binning methods on the mock human gut dataset.
Comparison of viral genome retrieval performance according to (a) clustering metrics and (b) completeness and contamination criteria (Moderately complete: 50% ≤ completeness <70%, contamination ≤ 10%; Substantially complete: 70% ≤ completeness <90%, contamination ≤ 10%; Near-complete: completeness ≥ 90%, contamination ≤ 10%). ViralCC outperforms other binning methods on the mock human gut dataset. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. ViralCC outperforms other binners on real metagenomic Hi-C datasets.
Comparison of draft viral bins retrieved by different binning tools according to the CheckV completeness standard on the (a) human gut, (b) cow fecal, and (c) wastewater datasets. ViralCC can retrieve more complete viral genomes compared to VAMB, CoCoNet, vRhyme, bin3C, and MetaTOR from all three real metagenomic Hi-C samples. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Heatmaps of raw Hi-C contact matrices of the top ten vMAGs from real metagenomic Hi-C datasets.
Heatmaps of raw Hi-C contact matrices of the top ten vMAGs from the (a) human gut, (b) cow fecal, and (c) wastewater datasets with the contig index as the axis unit. The vMAGs were first ranked by their numbers of contigs and then the contigs within each vMAG were ranked by their sizes. The scale bar shows the number of raw Hi-C contacts between viral contigs.
Fig. 5
Fig. 5. Taxonomy statistics of annotated vMAGs from real metagenomic Hi-C datasets.
Taxonomy statistics of annotated vMAGs on the (a) human gut, (b) cow fecal, and (c) wastewater datasets. The numbers on the graph indicate the number of vMAGs belonging to different families. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Taxonomic annotations of MAGs and the apparent infection spectrum of vMAGs from the domestic wastewater sample.
(a) Taxonomic annotations of MAGs recovered by HiCBin from the domestic wastewater sample. Burkholderiales, Pseudomonadales, Lachnospirales, Bacteroidales, and Oscillospirales were the predominant orders. (b) The apparent infection spectrum of vMAGs from the wastewater sample. vMAGs belonging to the family Myoviridae mainly targeted hosts from the order Burkholderiales and a large number of vMAGs from the family Siphoviridae could infect Bacteroidales bacteria. Source data are provided as a Source Data file.

Similar articles

Cited by

References

    1. Breitbart M, Rohwer F. Here a virus, there a virus, everywhere the same virus? Trends Microbiol. 2005;13:278–284. - PubMed
    1. Gobler CJ, Hutchins DA, Fisher NS, Cosper EM, Saňudo-Wilhelmy SA. Release and bioavailability of C, N, P Se, and Fe following viral lysis of a marine chrysophyte. Limnol. Oceanogr. 1997;42:1492–1504.
    1. Suttle CA. Marine viruses-major players in the global ecosystem. Nat. Rev. Microbiol. 2007;5:801–812. - PubMed
    1. Fuhrman JA. Marine viruses and their biogeochemical and ecological effects. Nature. 1999;399:541–548. - PubMed
    1. Jiao N, et al. Microbial production of recalcitrant dissolved organic matter: long-term carbon storage in the global ocean. Nat. Rev. Microbiol. 2010;8:593–599. - PubMed

Publication types