. 2012;7(8):e42342.

doi: 10.1371/journal.pone.0042342. Epub 2012 Aug 10.

Conservation of gene cassettes among diverse viruses of the human gut

Samuel Minot¹, Gary D Wu, James D Lewis, Frederic D Bushman

Affiliations

PMID: 22900013
PMCID: PMC3416800
DOI: 10.1371/journal.pone.0042342

Conservation of gene cassettes among diverse viruses of the human gut

Samuel Minot et al. PLoS One. 2012.

. 2012;7(8):e42342.

doi: 10.1371/journal.pone.0042342. Epub 2012 Aug 10.

Authors

Samuel Minot¹, Gary D Wu, James D Lewis, Frederic D Bushman

Affiliation

¹ Department of Microbiology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

PMID: 22900013
PMCID: PMC3416800
DOI: 10.1371/journal.pone.0042342

Abstract

Viruses are a crucial component of the human microbiome, but large population sizes, high sequence diversity, and high frequencies of novel genes have hindered genomic analysis by high-throughput sequencing. Here we investigate approaches to metagenomic assembly to probe genome structure in a sample of 5.6 Gb of gut viral DNA sequence from six individuals. Tests showed that a new pipeline based on DeBruijn graph assembly yielded longer contigs that were able to recruit more reads than the equivalent non-optimized, single-pass approach. To characterize gene content, the database of viral RefSeq proteins was compared to the assembled viral contigs, generating a bipartite graph with functional cassettes linking together viral contigs, which revealed a high degree of connectivity between diverse genomes involving multiple genes of the same functional class. In a second step, open reading frames were grouped by their co-occurrence on contigs in a database-independent manner, revealing conserved cassettes of co-oriented ORFs. These methods reveal that free-living bacteriophages, while usually dissimilar at the nucleotide level, often have significant similarity at the level of encoded amino acid motifs, gene order, and gene orientation. These findings thus connect contemporary metagenomic analysis with classical studies of bacteriophage genomic cassettes. Software is available at https://sourceforge.net/projects/optitdba/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. The de Bruijn graph assembly method and the influence of genomic variation on de Bruijn graph complexity.**
A) Shotgun sequences are produced from two different genomes (shown in blue and red at the top). Those sequences are used to construct a de Bruijn graph, where nodes are formed by all possible sequences of length k-1 (in this case 4 bases), which are connected by edges of length k (5 bases). Since there are no 4mers shared between these two example genomes, the resulting de Bruijn subgraphs are separate. B) Nucleotide polymorphisms are better resolved by short kmers. We consider a mixture of four genomes, each with three polymorphic positions separated by 25 bp. The identity at each polymorphic position is represented by either blue or red to indicate different nucleotides. At all other positions the genomes are identical. The de Bruijn graph that is constructed from this mixture of genomes using a kmer of 23 is shown on the left, where three independent bubbles form around each polymorphic position. The equivalent graph at k = 27 is shown on the right, where three independent sets of bubbles overlap, forming a more complex and suboptimal graph structure. C) Short regions of similarity are better resolved by long kmers. We consider a mixture of two genomes which are entirely different except for a 25 bp region of sequence identity (shown in black). The de Bruijn graph that is constructed from this mixture at k = 23 is shown on the left, where the two resulting subgraphs intersect at the 23mer of similarity. The de Bruijn graph at k = 27 is shown on the right, where the two resulting subgraphs (corresponding to the two genomes) do not intersect, since they have no 26mer in common. The examples in B and C together illustrate how different kmers can be optimal for assembling graphs with different types of polymorphisms.

**Figure 2. Comparison of assembly methods by read alignment.**
The vertical axis indicates the number of reads from each dataset that align to contigs of different size classes (either less than 1 kb, between 1 kb and 3 kb, between 3 and 10 kb, or longer than 10 kb). The horizontal axis separates assembly method. Each dataset is indicated by color (see key on right; numbers indicate gut virome communities from different human subjects). * indicates p<0.05 by Wilcoxon signed-rank test for the indicated pair of assembly methods.

**Figure 3. Network based annotation of viral contigs.**
Orange circles represent viral contigs no shorter than 3 kb. Black circles represent proteins in the RefSeq viral database. RefSeq proteins are connected to viral contigs when an ORF encoded by that contig resembles that protein at E<10⁻⁵⁰ (blastp). Blue outlines indicate groups of RefSeq proteins and ORFs from contigs that share the function indicated by the adjacent label.

**Figure 4. Two examples of phage cassettes.**
Contigs are shown as horizontal black lines, ORFs on those contigs are shown by black arrows above and below those lines, and the organization of those ORFs into protein-coding families is shown with colored boxes. The subject that each contig was assembled from is shown on the left of each panel. When a protein-coding family was functionally annotated according to its similarity with the CDD, that annotation is listed in the legend. Otherwise a unique identification number is shown (e. g. Family 591). The co-orientation score describes the proportion of gene pairs that, when occurring together on multiple contigs, do so in the same relative orientation.

See this image and copyright information in PMC

Cited by

Transfer of Viral Communities between Human Individuals during Fecal Microbiota Transplantation.
Chehoud C, Dryga A, Hwang Y, Nagy-Szakal D, Hollister EB, Luna RA, Versalovic J, Kellermayer R, Bushman FD. Chehoud C, et al. mBio. 2016 Mar 29;7(2):e00322. doi: 10.1128/mBio.00322-16. mBio. 2016. PMID: 27025251 Free PMC article.
Gut Bacteriophage: Current Understanding and Challenges.
Sutton TDS, Hill C. Sutton TDS, et al. Front Endocrinol (Lausanne). 2019 Nov 29;10:784. doi: 10.3389/fendo.2019.00784. eCollection 2019. Front Endocrinol (Lausanne). 2019. PMID: 31849833 Free PMC article. Review.
Optimizing protocols for extraction of bacteriophages prior to metagenomic analyses of phage communities in the human gut.
Castro-Mejía JL, Muhammed MK, Kot W, Neve H, Franz CM, Hansen LH, Vogensen FK, Nielsen DS. Castro-Mejía JL, et al. Microbiome. 2015 Nov 17;3:64. doi: 10.1186/s40168-015-0131-4. Microbiome. 2015. PMID: 26577924 Free PMC article.
Movers and shakers: influence of bacteriophages in shaping the mammalian gut microbiota.
Mills S, Shanahan F, Stanton C, Hill C, Coffey A, Ross RP. Mills S, et al. Gut Microbes. 2013 Jan-Feb;4(1):4-16. doi: 10.4161/gmic.22371. Epub 2012 Sep 28. Gut Microbes. 2013. PMID: 23022738 Free PMC article. Review.
Genomic characteristics and environmental distributions of the uncultivated Far-T4 phages.
Roux S, Enault F, Ravet V, Pereira O, Sullivan MB. Roux S, et al. Front Microbiol. 2015 Mar 16;6:199. doi: 10.3389/fmicb.2015.00199. eCollection 2015. Front Microbiol. 2015. PMID: 25852662 Free PMC article.

See all "Cited by" articles

References

1. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464: 59–65. - PMC - PubMed
1. Reyes A, Haynes M, Hanson N, Angly FE, Heath AC, et al. (2010) Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466: 334–338. - PMC - PubMed
1. Minot S, Sinha R, Chen J, Li H, Keilbaugh SA, et al. (2011) The human gut virome: Inter-individual variation and dynamic response to diet. Genome Res 21: 1616–1625. - PMC - PubMed
1. Kingsford C, Schatz MC, Pop M (2010) Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11: 21. - PMC - PubMed
1. Charuvaka A, Rangwala H (2011) Evaluation of short read metagenomic assembly. BMC Genomics 12 Suppl 2S8. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Conservation of gene cassettes among diverse viruses of the human gut

Affiliation

Conservation of gene cassettes among diverse viruses of the human gut

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources