Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 20;23(11):e3003510.
doi: 10.1371/journal.pbio.3003510. eCollection 2025 Nov.

Benchmarking with synthetic communities provides a baseline for virus-host inferences from Hi-C proximity linking

Affiliations

Benchmarking with synthetic communities provides a baseline for virus-host inferences from Hi-C proximity linking

Rokaiya Nurani Shatadru et al. PLoS Biol. .

Abstract

Microbiomes influence diverse ecosystems, and viruses increasingly appear to impose key constraints. While viromics has expanded genomic catalogs, host identification for these viruses remains challenging due to the limitations in scaling cultivation-based approaches and the uncertain reliability and relative low resolution of in silico predictions - particularly for understudied viral taxa. Towards this, Hi-C proximity ligation uses sequenced, cross-linked virus and host genomic fragments to infer virus-host linkages and has now been applied in at least 10 studies. However, its accuracy remains unknown. Here we assess Hi-C performance in recovering virus-host interactions using synthetic communities (SynComs) composed of four marine bacterial strains and nine phages with known interactions and then apply optimized bioinformatic protocols to natural soil samples. In SynComs, standard Hi-C sample preparations and analyses showed poor normalized contact score performance (26% specificity, 100% sensitivity, incorrect matches up to class level) that could be dramatically improved by Z-score filtering (Z ≥ 0.5, 99% specificity), though at reduced sensitivity (62% down from 100%). Detection limits were established as reproducibility was poor below minimal phage abundances of 105 PFU/mL. Applying optimized bioinformatic protocols to natural soil samples, we compared virus-host linkages inferred from proximity-ligated Hi-C sequencing with predictions generated by in silico homology-based and machine learning-based bioinformatic approaches. Prior to Z-score thresholding, agreement was relatively high at the phylum to family levels (72%), but not at the genus (43%) or species (15%) levels. Z-score thresholding reduced sensitivity (only 34% of predictions were retained), with only modest improvements in congruence with bioinformatic methods (48% or 18% at genus or species levels, respectively). Regardless, this led to 79 genus-level-congruent virus-host linkages and 293 new ones revealed by Hi-C alone, i.e., providing many new virus-host interactions to explore in already well-studied climate-critical soils. Overall, these findings provide empirical benchmarks and methodological guidelines to improve the accuracy and reliability of Hi-C for virus-host linkage studies in complex microbial communities.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Synthetic communities and experimental schema used to assess Hi-C virus-host linkages.
A. Synthetic communities were built from four bacterial strains (CBA, Cellulophaga baltica; PSA, Pseudoalteromonas) and 9 phages (listed fully in S1 Table) that were experimentally evaluated for infection in pairwise combinations via traditional plaque assays. Black boxes denote that the virus successfully plaques on the bacterial strain, whereas white (missing) boxes denote a negative, non-plaquing interaction. B. Schematic representation of the Hi-C experiment used to test virus-host relationships. After generating the synthetic community with the organisms mentioned above, Hi-C libraries were prepared and sequenced. Subsequently, bioinformatic analyses were performed to determine whether the expected virus-host linkages known from pairwise isolate-based experiments (denoted with black boxes) were observed.
Fig 2
Fig 2. Hi-C linkages from SynCom-1.
A. Contact scores (left) and corresponding Z-scores (right) calculated for each replicate of SynCom-1, categorized by host strains. The contact score represents the number of Hi-C linkages between a virus and a host genome, normalized for the number of restriction sites, genome length, and coverage. Z-scores were calculated from the contact scores within each sample to enable comparison across samples. The black dots indicate correct virus-host linkages, the gray dots indicate incorrect virus-host linkages. The red vertical dotted line is drawn at Z-score = 0.5. B. Virus-host linkages determined from a non-zero contact score (left) or using a filtering approach (i.e., requiring a Z-score generated from non-score normalized Hi-C scores above 0.5; right). The black boxes denote true positives, the gray boxes denote false positives, and the striped boxes denote false negatives. The data underlying this figure can be found in S6 Table.
Fig 3
Fig 3. Cryopreservation experiment to assess impact on SynCom-1 Hi-C linkages.
A. All Z-scores (left) and virus-host linkages (right) for each replicate of SynCom-1 cryopreserved with DMSO and categorized by host strains. Black and gray dots indicate correct or incorrect virus-host linkages, respectively, while black, gray, and striped boxes indicate true and false positives, and false negatives, respectively. The red vertical dotted line is drawn at Z-score = 0.5. B. Same data type as A, but for betaine-preserved samples. C. Average sensitivity (gray bar) and specificity (black bar) rates calculated for SynCom-1 treated with and without cryoprotective agents. The data underlying panel A and B can be found in S6 Table and the data underlying panel C can be found in S7 Table.
Fig 4
Fig 4. Detection limit experiment to evaluate Hi-C linkages in varied concentration SynCom-2 and SynCom-3.
A. All Z-scores (left) and virus-host linkages (right) for each replicate of SynCom-2, categorized by host strains. All figure elements are the same as described in Fig 3. B. Same data type as A, but for SynCom-3. C. Average sensitivity and specificity rates calculated for SynCom-1, SynCom-2, and SynCom-3 without cryoprotective agents. The gray bar represents sensitivity, and the black bar represents specificity. The data underlying Fig 4A and 4B can be found in S6 Table and the data underlying Fig 4C can be found in S7 Table.
Fig 5
Fig 5. Comparison of virus-host prediction from Hi-C and in silico tools.
A. Eular plot showing the overlap of viruses with host predictions obtained from the experimental Hi-C linkage approach, or one of two in silico tools (iPHoP and VirMatcher) that use different probabilistic models to aggregate output of various sequence-based features to create host prediction scores. B. Comparison of virus-host predictions across all samples between Hi-C and iPHoP, shown with and without applying a Z-score filter for the Hi-C linkages. Black bars indicate congruent predictions identified from both tools and gray bars indicate non-congruent predictions. Note: Although many viruses had multiple predicted hosts from each tool, only the top-scoring prediction for each virus was considered in this comparison. The data underlying Fig 5b can be found in S12 Table.

Update of

References

    1. Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive Earth’s biogeochemical cycles. Science. 2008;320(5879):1034–9. doi: 10.1126/science.1153213 - DOI - PubMed
    1. Trivedi P, Leach JE, Tringe SG, Sa T, Singh BK. Plant-microbiome interactions: from community assembly to plant health. Nat Rev Microbiol. 2020;18(11):607–21. doi: 10.1038/s41579-020-0412-1 - DOI - PubMed
    1. Levin D, Raab N, Pinto Y, Rothschild D, Zanir G, Godneva A, et al. Diversity and functional landscapes in the microbiota of animals in the wild. Science. 2021;372(6539):eabb5352. doi: 10.1126/science.abb5352 - DOI - PubMed
    1. Fan Y, Pedersen O. Gut microbiota in human metabolic health and disease. Nat Rev Microbiol. 2021;19(1):55–71. doi: 10.1038/s41579-020-0433-9 - DOI - PubMed
    1. Brum JR, Sullivan MB. Rising to the challenge: accelerated pace of discovery transforms marine virology. Nat Rev Microbiol. 2015;13(3):147–59. doi: 10.1038/nrmicro3404 - DOI - PubMed

LinkOut - more resources