Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013:4:2420.
doi: 10.1038/ncomms3420.

Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences

Affiliations
Free PMC article

Genome signature-based dissection of human gut metagenomes to extract subliminal viral sequences

Lesley A Ogilvie et al. Nat Commun. 2013.
Free PMC article

Abstract

Bacterial viruses (bacteriophages) have a key role in shaping the development and functional outputs of host microbiomes. Although metagenomic approaches have greatly expanded our understanding of the prokaryotic virosphere, additional tools are required for the phage-oriented dissection of metagenomic data sets, and host-range affiliation of recovered sequences. Here we demonstrate the application of a genome signature-based approach to interrogate conventional whole-community metagenomes and access subliminal, phylogenetically targeted, phage sequences present within. We describe a portion of the biological dark matter extant in the human gut virome, and bring to light a population of potentially gut-specific Bacteroidales-like phage, poorly represented in existing virus like particle-derived viral metagenomes. These predominantly temperate phage were shown to encode functions of direct relevance to human health in the form of antibiotic resistance genes, and provided evidence for the existence of putative 'viral-enterotypes' among this fraction of the human gut virome.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Overview of the PGSR approach.
TUPs of all large fragments (10 kb or over) from 139 human gut metagenomes were calculated, and compared with those of phage genome sequences used as drivers. All metagenomic fragments producing tetranucleotide correlation values of 0.6 or over to any driver sequence were retained, and subjected to functional profiling to resolve phage and non-phage sequences captured. See Table 1 and Supplementary Figs S1–S3 for details of driver sequences. See Supplementary Table S1 for details of human gut metagenomes utilized. *Tetranucleotide usage patterns and correlations were calculated using TETRA 1.0 (ref. 46).
Figure 2
Figure 2. Analysis of chromosomal contamination in PGSR phage sequences.
Owing to the dominance of chromosomal sequences in the metagenomic data sets analysed and the likelihood that many PGSR phage represent integrated prophage, PGSR phage were examined for the presence of terminal chromosomal regions. (a) Physical maps of 20 randomly selected PGSR phage sequences indicating ORFs with homologues in other phage sequences. Graphs associated with each phage sequence show % G+C across the sequence. ORF homologues in phage data sets were identified based on tBlastn searches (1e−3 or lower) of 711 complete or partial phage genomes, and all contigs assembled from human gut viral metagenomes. ORFs highlighted in cyan have homologues in phage genomes. ORFs highlighted in red generated no valid hits to phage sequences but encode conserved domains with phage-related functions (for example, capsid, integrase and recombination/replication). (b) Relative abundance of ORFs homologous to those encoded by PGSR phage and PGSR non-phage contigs, in phage sequences (711 phage genomes, PGSR phage sequences and assemblies of human gut viromes) and chromosomes (1,821 chromosomes and all PGSR non-phage) expressed as hits per Mb DNA (valid hits=minimum 35% identity over 30 aa or more, 1e−5 or lower). ***P≤0.001 (χ2-test). Data sets and sequences utilized are described in Supplementary Table S1, Supplementary Data 3–6).
Figure 3
Figure 3. Recovery of PGSR phage sequences from metagenomic data sets.
Commonly used alignment-driven approaches to analyse metagenomes were evaluated for their ability to identify PGSR phage sequences. The same metagenomic data sets surveyed using the PGSR approach were also subjected to a range of alignment-based searches, including gene-centric searches with unambiguous phage-encoded ORFs (capsid and terminase genes). In addition, 991 non-redundant phage contigs also identified in searches of these datasets by Stern et al., using the recently developed CRISPR strategy, were compared. Pie charts depicted show the proportion of PGSR phage sequences captured by each strategy, as well as the total proportion of PGSR phage identified by all strategies in combination (percentages shown). Blastn, Megablast, Discontiguous Megablast: show the proportions of PGSR phage captured in alignments with different blast algorithms when metagenomes were queried at the nucleotide level using whole-PGSR phage driver sequences (1e−3 or lower considered significant and retained). tBlastn: shows proportion of PGSR phage sequences identified using gene-centric surveys of metagenomes with all capsid and terminase genes encoded by driver sequences (1e−3 or lower considered significant). CRISPR: proportion of PGSR phage sequences identified in the 991 phage-like contigs identified by Stern et al., in recent surveys of the same metagenomes using CRISPR spacer regions. All searches: shows the total proportion of PGSR phage identified in the combined output of all searches conducted above.
Figure 4
Figure 4. Inference of PGSR phage host-range.
PGSR sequences were compared with a wide range of bacterial chromosomes and phage genomes, using both tetranucleotide profiles and alignment-based methods (Blast). (a) Phylogram showing relationships between PGSR sequences, human gut-associated chromosomes (n=324) and all large contigs from assembled gut viral metagenomes (n=188, 10 kb or over), based on tetranucleotide profiles. Clusters I–IV indicate regions populated by PGSR phage and driver sequences, and associated pie charts provide the proportion of total PGSR phage sequences in each cluster, designated by black segments. NT (nucleotide): shows genus-level taxonomic assignments for PGSR phage in each cluster based on Blastn searches, and figures in parentheses show total number of PGSR phage affiliated with each genus (≥75% identity, 1e−5 or lower, alignment length of 1 kb or more). ORF: shows genus-level taxonomic assignments for PGSR phage in each cluster based on tBlastn alignments of individual PGSR phage ORFs with 1,700 complete bacterial chromsomes (≥75% identity, 1e−5 or lower). Figures in parentheses show total number of PGSR phage ORFs affiliated with each genus listed. (b) Phylogram showing relationships between PGSR phage sequences, large fragments from gut viral metagenomes, and complete phage genomes (n=647 genomes, 10 kb or over), based on tetranucleotide profiles. For phage genome sequences assigned phylogeny reflects that of host species where known. Scale bars for parts a and b show distance in arbitrary units, and all phylograms represent the most probable topologies based on 200 bootstrap replicates. (c) Total proportion of PGSR sequences and viral metagenome contigs represented in part a affiliated to phylum-level taxonomic groups based on alignments against 1,821 bacterial and archaeal chromsomes. Nucleotide: shows the proportion of sequences affiliated to each phylum based on valid Blastn hits (minimum 75% identity over 1 kb or more, 1e−5 or lower). Amino acid: shows affiliation of all putative protein encoding genes from each data set based on tBlastn searches (minimum 75% identity or over, 1e−5 or lower). See also Supplementary Data 2. The source and further details of sequences used in the analyses presented in ac is provided in Supplementary Table S1, Supplementary Data 3–6.
Figure 5
Figure 5. PGSR phage representation in human gut viral metagenomes.
The representation of PGSR phage sequences in existing gut viral metagenomes, as well as viral and chromosomal metagenomes from other habitats, was assessed and compared with other phage sequence sets. (a) Representation of phage sequence sets in human gut viral metagenomes. Individual pyrosequencing reads were mapped to respective phage sequence sets with high stringency (a minimum of 90% identity over 90% of the read). The number of reads mapped was normalized for size of reference data sets (expressed as reads mapped/Mb reference sequence). (b) Heat map showing relative representation of PGSR phage and other phage sequence sets in viromes from gut and non-gut habitats. Reads from each virome were mapped to reference phage sequence sets as for part a, but using low stringency criteria (minimum 70% identity over 25% of the read). The percentage of reads mapped was normalized for size of reference data sets (expressed as % reads mapped/Mb reference sequence). (c) Proportion of phage with homology to sequences in standard metagenomes and virome assemblies, derived from gut and non-gut habitats. Phage sequences from each collection were used to search metagenomic data sets with Blastn, and valid hits (minimum 75% identity over 100 nt or more, 1e−5 or lower) were used to assign each sequence to one of five categories. GT (gut): phage sequences producing valid hits only in gut data sets; NG (non-gut): phage sequences producing valid hits only in non-gut data sets; GAH (gut-associated high): phage sequences producing valid hits in both gut and non-gut data sets, but with the majority derived from gut metagenomes. GAL (gut-associated low): phage sequences generating valid hits in both gut and non-gut data sets, but with the majority originating from non-gut metagenomes; UNCLASS: sequences producing no valid hits in any metagenome examined. Gut vir >500 bp—all contigs from human gut virome assemblies over 500 bp in length; Gut vir bact assoc.—all contigs from human gut virome assemblies affiliated with Bacteroidales driver sequences based on PGSR search criteria (as used to identify PGSR phage sequences in gut metagenomes); PGSR phage—all 85 Bacteroidales-like PGSR sequences classified as phage; marine phage—99 phage genome sequences from marine phage; NCBI phage—612 complete phage genomes available from the NCBI phage refseq collection. **P≤0.01 (χ2-test). Details of viromes, metagenomes and phage genomes utilized are provided in Supplementary Table S1, Supplementary Data 3–6.
Figure 6
Figure 6. Functional profiles of PGSR sequences.
The functional profiles of PGSR phage and non-phage sequences were compared with those found in phage genomes (n=711), gut virome fragments (all contigs assembled from 12 individual gut viromes11), and 70 chromosomes from gut-associated Bacteroidales species (See Supplementary Table S1, Supplementary Data 3–6 for source and details of sequence data). Amino-acid sequences from all predicted ORFs in each data set were used to search the COG database, the CDD, and the ACLAME database. The proportion of assignable ORFs affiliated to distinct categories in each database is displayed in horizontal bars, and associated pie charts show the total proportion of ORFs in each sequence set generating valid hits in database searches (black segments). (a) Results from searches of the COG database, showing proportions of ORFs assignable to COG classes. (b) Results for searches of the CDD, showing proportions of ORFs encoding conserved domain architectures related to phage and non-phage associated functions. (c) Results from searches of the ACLAME database, showing proportions of ORFs generating valid hits to genes encoded by distinct types of mobile genetic element represented in the database (plasmid, virus and prophage). All phage shows combined results from PGSR-phage, NCBI phage, Marine phage and Gut virome fragments. All non-phage shows combined results from PGSR non-phage and Bacteroidales chromosomes. Stars highlight the position of PGSR phage and non-phage sequences in charts.
Figure 7
Figure 7. Representation of PGSR phage sequences in the human gut metaproteome.
To further explore the functional profile of PGSR Bacteroidales-like phage, and their contribution to the human gut metaproteome, a shotgun metaproteome was generated from a human faecal microbiome and the resulting 177,729 mass spectra used to search custom databases of all putative proteins encoded PGSR phage, PGSR non-phage and VLP-derived contigs from human gut viral metagenomes. (a) Shows relative hit rates in the gut metaproteome, for amino-acid sequences originating in each data set used to query mass spectra (PGSR phage, PGSR non-phage, VLP-derived gut virome). Relative hit rates were calculated by normalizing the number of proteins from each data set detected in the gut metaproteome by the total number of ORFs in parental data sets (expressed as hits per total number of predicted proteins in each data set). Symbols above bars indicate statistically significant differences in relative hit rate with the data set of corresponding symbol colour (**P=0.01 or lower; ***P=0.001 or lower; χ2-test). Putative functions of identified proteins were based on COG searches (1e−2 or lower; Supplementary Table S3). (b) Heat map shows relative abundance of sequences homologous to those detected in the gut metaproteome, within a broad cross section of bacterial and archaeal chromosomal sequences (n=1,821, PGSR non-phage), and phage sequences (711 phage genomes, PGSR phage sequences and assemblies of human gut viromes), expressed as hits per Mb DNA (valid hits=minimum 35% identity over 30 aa or more, 1e−5 or lower). See Supplementary Table S1, Supplementary Data 3–6 for sources and details of sequences used.
Figure 8
Figure 8. Inter-individual variation of Bacteroidales-like viral-enterotypes.
Inter-individual variation in carriage of PGSR phage and related sequences was assessed by calculating relative abundance of sequences with homology to PGSR phage in individual gut metagenomes (minimum 80% identity over 50% of subject sequence, 1e−5 or lower). (a,b) Heat maps illustrating relative abundance of PGSR phage sequences in human gut metagenomes. Columns represent individual metagenomes and rows represent PGSR phage sequences. Intensity of shading in each cell indicates relative abundance of sequences homologous to each PGSR phage sequence, in each individual metagenome (hits per Mb). Associated histograms show average relative abundance of homologues to each PGSR phage sequence across all individuals (left histogram), average relative abundance of all PGSR phage homologues per individual (top histogram), and incidence of sequences homologous to each PGSR phage sequence as a % of positive metagenomes (Right histogram). Map a shows results ranked by average relative abundance across all PGSR phage and individuals. Map b shows results of heuristic hierarchical grouping of individuals based on phage relative abundance profiles into ‘viral-enterotypes’ A, B, C, D or unclassified (UC). The most broadly distributed PGSR phage (with an incidence of 40% or over), shown in the lower segment of this heat map, were not utilized for heuristic ranking. (c) The validity of putative viral-enterotypes was tested by ordination of individual relative abundance profiles using unsupervised non-metric MDS. Points represent individual gut metagenomes, and colours correspond to viral-enterotypes assigned in heat map b. (d) Shows values for the ANOSIM R statistic obtained from comparisons of groupings obtained in MDS plots (part c), which indicates increasing separation of groups as values approach 1. *** Denotes significant separation between groups (P=0.002). The sources of human gut metagenomes used in these analyses are provided in Supplementary Table S1.

References

    1. Suttle C. A. Viruses in the sea. Nature 437, 356–361 (2005). - PubMed
    1. Wommack K. E. & Colwell R. R. Virioplankton: viruses in aquatic ecosystems. Microbiol. Mol. Biol. Rev. 64, 69–114 (2000). - PMC - PubMed
    1. Reyes A., Semenkovich N. P., Whiteson K., Rohwer F. & Gordon J. I. Going viral: next generation sequencing applied to phage populations in the human gut. Nat. Rev. Microbiol. 10, 607–617 (2012). - PMC - PubMed
    1. Fuhrman J. A. Marine viruses and their biogeochemical and ecological effects. Nature 399, 541–548 (1999). - PubMed
    1. Brüssow H., Canchaya C. & Hardt W.-D. Phages and the evolution of bacterial pathogens: from genomic rearrangements to lysogenic conversion. Microbiol. Mol. Biol. Rev. 68, 560–602 (2004). - PMC - PubMed

Publication types