Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 10:14:320.
doi: 10.1186/1471-2164-14-320.

Sequencing platform and library preparation choices impact viral metagenomes

Affiliations

Sequencing platform and library preparation choices impact viral metagenomes

Sergei A Solonenko et al. BMC Genomics. .

Abstract

Background: Microbes drive the biogeochemistry that fuels the planet. Microbial viruses modulate their hosts directly through mortality and horizontal gene transfer, and indirectly by re-programming host metabolisms during infection. However, our ability to study these virus-host interactions is limited by methods that are low-throughput and heavily reliant upon the subset of organisms that are in culture. One way forward are culture-independent metagenomic approaches, but these novel methods are rarely rigorously tested, especially for studies of environmental viruses, air microbiomes, extreme environment microbiology and other areas with constrained sample amounts. Here we perform replicated experiments to evaluate Roche 454, Illumina HiSeq, and Ion Torrent PGM sequencing and library preparation protocols on virus metagenomes generated from as little as 10 pg of DNA.

Results: Using %G+C content to compare metagenomes, we find that (i) metagenomes are highly replicable, (ii) some treatment effects are minimal, e.g., sequencing technology choice has 6-fold less impact than varying input DNA amount, and (iii) when restricted to a limited DNA concentration (<1 μg), changing the amount of amplification produces little variation. These trends were also observed when examining the metagenomes for gene function and assembly performance, although the latter more closely aligned to sequencing effort and read length than preparation steps tested. Among Illumina library preparation options, transposon-based libraries diverged from all others and adaptor ligation was a critical step for optimizing sequencing yields.

Conclusions: These data guide researchers in generating systematic, comparative datasets to understand complex ecosystems, and suggest that neither varied amplification nor sequencing platforms will deter such efforts.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Experimental design overview. Library preparation treatments were done at varying levels of replication, as indicated by the numbers (1 to 3) next to each treatment. The number of amplification cycles (see y axis) includes those necessary to generate enough DNA for library preparation, but does not include the emPCR (454, Ion Torrent) or bridge (Illumina) amplification cycles used to build large enough populations of reads for nucleotide sequencing signal detection.
Figure 2
Figure 2
%G + C and duplication plots for Experiment 1 metagenomes. Heatmap coloring indicates the relative pairwise correlations (Pearson’s r) in the %G + C distributions (red-to-yellow) and duplicates (blue-to-green) where red and blue colors indicate the lowest levels of correlation, while white represents highly correlated data. The %G + C distribution correlations were UPGMA clustered with 100 bootstrap runs to indicate statistical support (only >60% support shown). Abbreviations are as follows: “Tech” is sequencing technology represented by 4 (454), T (Ion Torrent), I (Illumina), S (Sanger); “Pair” is the forward or reverse paired end sequence data; “Rep” is the arbitrarily labeled replicate ranging from two (A and B) to three (A, B, or C); “ng” is the nanograms of input DNA from which the viral metagenome was derived. The most reliable estimate of the true %G + C distribution is the unamplified 454 metagenomes. Relative to these, fosmid end sequences generated using Sanger sequencing were the most shifted toward high %G + C, while problematic <1000ng input DNA metagenomes were less shifted toward high %G + C, and reliable 1000ng Illumina metagenomes were only slightly shifted toward high %G + C.
Figure 3
Figure 3
Protein cluster functional analysis and assembly statistics for Experiment 1 metagenomes. Metagenomic reads were mapped to POV protein clusters (see text) and hit frequencies were used to produce pairwise correlation heat maps. Details as described in Figure 2, including bootstrap analysis of statistical support for correlations across metagenomes. Assembly performance of each sample across the dataset was evaluated using metrics of n50 and maximum contig size, as well as the number of reads and base pairs that were assembled. Note that inferior assembly performance was restricted to samples with reduced read yields. Lastly, the Newbler assembler yielded larger contigs and smaller total assemblies when compared to Velvet assembly of the same Ion Torrent dataset.
Figure 4
Figure 4
%G + C and duplication plots for Experiment 2 metagenomes. Details as described in Figure 2, including bootstrap analysis of statistical support for correlations across metagenomes. UPGMA clustering bootstrap support >60% shown only.
Figure 5
Figure 5
Protein cluster functional analysis and assembly statistics for Illumina-sequenced Experiment 2 metagenomes. Note that one metagenome from Station 109 DNA yielded significantly fewer reads and thus had a lower total assembly size. Details as described in Figure 3, including bootstrap analysis of statistical support for correlations across metagenomes.

References

    1. Chaffron S, Rehrauer H, Pernthaler J, von Mering C. A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res. 2010;20:947–959. doi: 10.1101/gr.104521.109. - DOI - PMC - PubMed
    1. Shapiro BJ, Friedman J, Cordero OX, Preheim SP, Timberlake SC, Szabo G, Polz MF, Alm EJ. Population genomics of early events in the ecological differentiation of bacteria. Science. 2012;336:48–51. doi: 10.1126/science.1218198. - DOI - PMC - PubMed
    1. Handelsman J, Tiedje JM, Alvarez-Cohen L, Ashburner M, Cann IKO, Delong EF, Doolittle WF, Fraser-Liggett CM, Godzik A, Gordon JI. New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Nat Res Council Report. 2007. p. 13.
    1. Glenn TC. Field guide to next-generation DNA sequencers. Mol Ecol Resour. 2011;11:759–769. doi: 10.1111/j.1755-0998.2011.03024.x. - DOI - PubMed
    1. Kircher M, Kelso J. High-throughput DNA sequencing–concepts and limitations. BioEssays : news and reviews in molecular, cellular and developmental biology. 2010;32:524–536. doi: 10.1002/bies.200900181. - DOI - PubMed

Publication types