Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 8;118(23):e2023202118.
doi: 10.1073/pnas.2023202118. Epub 2021 Jun 3.

A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases

Affiliations

A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases

Michael J Tisza et al. Proc Natl Acad Sci U S A. .

Abstract

Despite remarkable strides in microbiome research, the viral component of the microbiome has generally presented a more challenging target than the bacteriome. This gap persists, even though many thousands of shotgun sequencing runs from human metagenomic samples exist in public databases, and all of them encompass large amounts of viral sequence data. The lack of a comprehensive database for human-associated viruses has historically stymied efforts to interrogate the impact of the virome on human health. This study probes thousands of datasets to uncover sequences from over 45,000 unique virus taxa, with historically high per-genome completeness. Large publicly available case-control studies are reanalyzed, and over 2,200 strong virus-disease associations are found.

Keywords: genomics; microbiome; virome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
CHVD metrics. (A) Each classified contig is represented as dot, with the x-axis position representing contig length. Width of violin diagrams represent the density of sequences at a given position and are proportional between categories. Larger (> 4 kb) Circular Rep-Encoding Single Stranded DNA [CRESS] virus OTUs consist of contigs in a previously reported taxon that combines CRESS-like replication genes with inovirus-like virion genes (107, 108). (B) Genome quality bins are derived from CheckV analysis, and body site labels are derived from sample metadata from the exemplar sequence of each virus OTU. (C) Data from virion-enriched stool samples are plotted. To measure the degree to which enrichment for viral sequences was achieved, a ViromeQC Enrichment Score (32) was calculated for each sample (x-axis). The enrichment score is essentially the inverse abundance of known bacterial single-copy marker genes. (Top) Dotted lines of the top panel are moving averages of samples from the same study. Asterisks indicate Bioprojects/samples with data used in the production of the CHVD. (Bottom) Production samples are removed. Data are binned by ViromeQC score, and boxplots represent IQR values, center lines representing median, and whiskers representing 1.5 IQRs. A modified database in which sequences were clustered at 99% identity instead of 95% identity was used for the index to better capture microdiversity and metaviromic islands (109) (e.g., intraspecific structural variations consisting of insertions/deletions of gene cassettes; Materials and Methods). (D) Plots are the same as C but for oral virion preps.
Fig. 2.
Fig. 2.
Summary of CRISPR spacer match data. (A) Plots represent matches of bacterially encoded CRISPR spacers to virus contigs. Categories are defined by bacterial genera (or higher taxon when genus is not clearly defined; Materials and Methods). Only genera with 200 or more CRISPR spacer matches to CHVD OTUs are displayed. The x-axis values represent the number of unique bacterial CRISPR spacer hits for each virus OTU. Filamentous phage = Inoviridae and other filamentous phages (e.g., certain CRESS viruses). (B) Network diagram of phage–phage interaction landscape based on CRISPR spacer matches. Each line represents a match of a particular spacer sequence to its target phage.
Fig. 3.
Fig. 3.
Most common viruses, Stool (Gut). (Left) A scatter plot of RPKM (a measure of relative read abundance for a given virus OTU, y-axis) versus prevalence (proportion of samples with >0.1 RPKM, x-axis). For display purposes, the y-axis is a linear scale from 0 to 1 (100) and log10 above 1. The top 30 most commonly abundant virus OTUs (based on the product of coordinates) are colored. (Right) Histogram and rug plot of RPKM values across all samples for the most commonly abundant virus OTUs. Colors of dots in the Left correspond with the colors in the Right. The x- and y-axis are log scale. RPKM values below 0.1 are binned at the left extremity of the plots for display purposes.
Fig. 4.
Fig. 4.
Association of the virome and bacteriome with chronic diseases. (AC) Analysis of read data from PRJEB17784, a case-control study of stool samples from patients with or without Parkinson’s disease. (A) Virome-wide and bacteriome-wide associations in stool samples from Parkinson’s disease patients (n = 74) and healthy controls (n = 108) represented as Manhattan plots. Each OTU is represented as a dot along the x-axis, with its y-axis value being the inverse log10 P value. The size of each dot corresponds to the median relative abundance of the taxon in the disease cohort. Filled dots represent OTUs found at higher abundance in the diseased state while hollow dots represent decreased abundance in the diseased sate. The dashed gray line represents the false discovery rate < 1% threshold. (B) Receiver operating characteristic plots from 100 differently seeded random forest classifiers trained on the virome (Left) or bacteriome (Right). (C) Swarm plots of Cohen’s d effect sizes (absolute value) of OTUs achieving significant P values. Black dots are positive effect size, and red dots are negative effect size. The mean of all plotted effect sizes is drawn as a blue line. Small effect size = 0.2 to 0.5; medium effect size = 0.5 to 0.8; and large effect size = > 0.8 (84). (DF) Similar analyses of read data from PRJEB4336, a WGS survey of stool samples from obese and nonobese individuals. Plots D, E, and F are laid out in the same manner as plots A, B, and C, respectively.

References

    1. Pastrana D. V., et al. ., Metagenomic discovery of 83 new human papillomavirus types in patients with immunodeficiency. mSphere 3, e00645-18 (2018). - PMC - PubMed
    1. Wylie K. M., Weinstock G. M., Storch G. A., Emerging view of the human virome. Transl. Res. 160, 283–290 (2012). - PMC - PubMed
    1. Beller L., Matthijnssens J., What is (not) known about the dynamics of the human gut virome in health and disease. Curr. Opin. Virol. 37, 52–57 (2019). - PubMed
    1. Gilbert J. A., et al. ., Current understanding of the human microbiome. Nat. Med. 24, 392–400 (2018). - PMC - PubMed
    1. Manrique P., et al. ., Healthy human gut phageome. Proc. Natl. Acad. Sci. U.S.A. 113, 10400–10405 (2016). - PMC - PubMed

Publication types

LinkOut - more resources