. 2014 Apr 2;9(4):e93269.

doi: 10.1371/journal.pone.0093269. eCollection 2014.

Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm

Matthew Cotten¹, Bas Oude Munnink², Marta Canuti², Martin Deijs², Simon J Watson¹, Paul Kellam³, Lia van der Hoek²

Affiliations

¹ Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
² Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands.
³ Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Department of Infection, University College London, London, United Kingdom.

PMID: 24695106
PMCID: PMC3973683
DOI: 10.1371/journal.pone.0093269

Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm

Matthew Cotten et al. PLoS One. 2014.

. 2014 Apr 2;9(4):e93269.

doi: 10.1371/journal.pone.0093269. eCollection 2014.

Authors

Matthew Cotten¹, Bas Oude Munnink², Marta Canuti², Martin Deijs², Simon J Watson¹, Paul Kellam³, Lia van der Hoek²

Affiliations

¹ Wellcome Trust Sanger Institute, Hinxton, United Kingdom.
² Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands.
³ Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Department of Infection, University College London, London, United Kingdom.

PMID: 24695106
PMCID: PMC3973683
DOI: 10.1371/journal.pone.0093269

Abstract

We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. An overview of the ViSeq process.**

**Figure 2. An overview of the SLIM read classification process.**

**Figure 3. Detection of norovirus by real-time PCR vs the ViSeq process.**
Total ViSeq identified norovirus reads were compared to the Real time PCR determined norovirus viral loads. Pearson's correlation coefficient for all samples (−0.69), for all samples with Ct values below 35 (0.63) and for all samples with ViSeq reads above 10, (−0.59), indicate a strong negative correlation between the two methods of measurements.

**Figure 4. Taxonomy of reads in each sample.**
100,000 random reads from each sample dataset were subject to a nucleotide BLASTN search and hits with e values less than 0.001 were collected, and processed with MEGAN4 (see Materials and Methods). The Megan output was processed using Python script to generate a heat map of total reads in each sample in each category. Values were grouped into 5 categories and depicted with the following colors: less than 0 reads, white; 1–50 reads, grey; 51–500 reads, dark grey; 501 to 5000 reads, light green; 5001 to 50,000 reads, green, >50,000 reads, dark green (also see color bar scale to right of figure).

**Figure 5. A demonstration of SLIM function.**
100,000 random reads reads from sample 17 were processed with SLIM. For each cycle, the number of reads classified as virus, non-virus and mystery, the number of reads removed and the number of reads remaining are plotted. The cycle number and elapsed time is indicated below the graph, the cycle of identification of specific viruses is marked in the upper (virus_reads) graph.

**Figure 6. Quantitation of specific virus reads in each of the 20 samples.**
All reads for each sample (see Table 1 for total number of reads per samples) were mapped to the indicated viral genomes using MUMmer . The number of reads mapped to each virus (normalized for total reads in each sample) is depicted by color (see color bar scale to right of figure).

**Figure 7. Open reading frame structure and phylogenetic analysis of the adenovirus genomes identified in this study.**
The ORF pattern of the full genomes grey (for all ORFs >100 amino acids in length), with the initial ATG in each ORF (vertical red bar) and all stop codons (vertical black bars) are indicated. For clarity the stop codon positions were not marked in the adenovirus genomes. Also shown are the maximum likelihood trees inferred using PhyML version 3.0 under the general-time reversible substitution model. Among-site heterogeneity was considered through a discrete-gamma distribution model, and the robustness of the phylogeny assessed through bootstrap analysis of 1000 pseudo-replicates. The trees are marked with green node circles indicating the bootstrap support, (small green circle at 70% support, larger green circle at 100% support, black nodes indicate support below 70%). The genomes identified in this study are marked in red.

**Figure 8. Open reading frame structure and phylogenetic analysis of the human cosavirus genome identified in this study.**
Analysis and graphical presentation was performed as described in the legend to Figure 7.

**Figure 9. Open reading frame structure and phylogenetic analysis of the hepatitis B virus genomes identified in this study.**
Analysis and graphical presentation was performed as described in the legend to Figure 7. The HBV reference genome set was from reference .

**Figure 10. Open reading frame structure and phylogenetic analysis of the human papillomavirus genome identified in this study.**
Analysis and graphical presentation was performed as described in the legend to Figure 7. The HPV reference genomes are from reference , For the phylogenetic analysis, the ORFS for E6-E7-E1-E2-L2-L1 were concatenated.

**Figure 11. Open reading frame structure and phylogenetic analysis of the norovirus genomes identified in this study.**
Analysis and graphical presentation was performed as described in the legend to Figure 7.

**Figure 12. Open reading frame structure and phylogenetic analysis of the Torque teno virus genomes identified in this study.**
Analysis and graphical presentation was performed as described in the legend to Figure 7. The TTV reference set was from reference .

See this image and copyright information in PMC

References

1. Woolhouse M, Scott F, Hudson Z, Howey R, Chase-Topping M (2012) Human viruses: discovery and emergence. Philos Trans R Soc Lond B Biol Sci 367: 2864–2871. - PMC - PubMed
1. Ge X, Li Y, Yang X, Zhang H, Zhou P, et al. (2012) Metagenomic analysis of viruses from bat fecal samples reveals many novel viruses in insectivorous bats in China. J Virol 86: 4620–4630. - PMC - PubMed
1. Donaldson EF, Haskew AN, Gates JE, Huynh J, Moore CJ, et al. (2010) Metagenomic analysis of the viromes of three North American bat species: viral diversity among different bat species that share a common habitat. J Virol 84: 13004–13018. - PMC - PubMed
1. Li L, Victoria JG, Wang C, Jones M, Fellers GM, et al. (2010) Bat guano virome: predominance of dietary viruses from insects and plants plus novel mammalian viruses. J Virol 84: 6955–6965. - PMC - PubMed
1. Phan TG, Kapusinszky B, Wang C, Rose RK, Lipton HL, et al. (2011) The fecal viral flora of wild rodents. PLoS Pathog 7: e1002218. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm

Affiliations

Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials