Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics

Affiliations

¹ Australian Centre for Ecogenomics/School of Chemistry and Molecular Biosciences, University of Queensland , Brisbane, QLD , Australia.
² Climate Change Cluster, University of Technology Sydney , Sydney, New South Wales , Australia.
³ Department of Civil, Environmental and Geomatic Engineering, ETH Zurich , Zurich , Switzerland.
⁴ Australian Centre for Ecogenomics/School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia; Advanced Water Management Centre, University of Queensland, Brisbane, QLD, Australia.
⁵ Australian Centre for Ecogenomics/School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia; Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia.

PMID: 27688978
PMCID: PMC5036114
DOI: 10.7717/peerj.2486

Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics

Christian Rinke et al. PeerJ. 2016.

. 2016 Sep 22:4:e2486.

doi: 10.7717/peerj.2486. eCollection 2016.

Affiliations

¹ Australian Centre for Ecogenomics/School of Chemistry and Molecular Biosciences, University of Queensland , Brisbane, QLD , Australia.
² Climate Change Cluster, University of Technology Sydney , Sydney, New South Wales , Australia.
³ Department of Civil, Environmental and Geomatic Engineering, ETH Zurich , Zurich , Switzerland.
⁴ Australian Centre for Ecogenomics/School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia; Advanced Water Management Centre, University of Queensland, Brisbane, QLD, Australia.
⁵ Australian Centre for Ecogenomics/School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia; Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia.

PMID: 27688978
PMCID: PMC5036114
DOI: 10.7717/peerj.2486

Abstract

High-throughput sequencing libraries are typically limited by the requirement for nanograms to micrograms of input DNA. This bottleneck impedes the microscale analysis of ecosystems and the exploration of low biomass samples. Current methods for amplifying environmental DNA to bypass this bottleneck introduce considerable bias into metagenomic profiles. Here we describe and validate a simple modification of the Illumina Nextera XT DNA library preparation kit which allows creation of shotgun libraries from sub-nanogram amounts of input DNA. Community composition was reproducible down to 100 fg of input DNA based on analysis of a mock community comprising 54 phylogenetically diverse Bacteria and Archaea. The main technical issues with the low input libraries were a greater potential for contamination, limited DNA complexity which has a direct effect on assembly and binning, and an associated higher percentage of read duplicates. We recommend a lower limit of 1 pg (∼100-1,000 microbial cells) to ensure community composition fidelity, and the inclusion of negative controls to identify reagent-specific contaminants. Applying the approach to marine surface water, pronounced differences were observed between bacterial community profiles of microliter volume samples, which we attribute to biological variation. This result is consistent with expected microscale patchiness in marine communities. We thus envision that our benchmarked, slightly modified low input DNA protocol will be beneficial for microscale and low biomass metagenomics.

Keywords: 100 fg; Illumina; Low biomass; Low input DNA library; Low volume; Marine microheterogeneity; Microscale metagenomics; Nextera XT; Picogram; Reagent contamination.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1. Dilution series evaluation for low input DNA libraries.**
Dilutions down to 1:50 of amplicon tagment mix (ATM) for low input DNA libraries of 1 pg DNA are shown in comparison to the 1ng SOP. The trimmed mean insert size (length of DNA fragments in bases without adaptors, determined via read mapping) is plotted against the relative number of read duplicates. Libraries were created with 1 pg of (A) *E. coli* DNA, (B) Mock community DNA. Reads were subsampled to five million read pairs. Note that *E. coli* was not sequenced with the 1 ng SOP.

**Figure 2. Yield and quality assessment of low input libraries.**
The bar graph shows the absolute number of reads for all replicates of the 1 ng SOP, the low input libraries (100pg, 10pg, 1pg, 100fg) and the negative controls (grey background). Negative controls are comprised of the library prep kit control (NegLib) and the DNA extraction kit + library prep kit control (NegExt), see Methods for details. Reads are colour coded based on the reference they aligned to, including the bacterial and archaeal mock community (green) and the human genome (blue). The remaining reads are shown as unmapped (orange) or mapped against the contaminant Methylobaterium aerolatum (red). The calculated cell number range (∼no. cells) is based on the amount of input DNA and an estimated 1–10 fg DNA per microbial cell. The sequence yield is provided as million reads (read yield). The average insert size (insert size), the average percent GC content (%GC), and the average number of read duplicates (% duplicates) was calculated as a mean of all replicates. The bar above the figure indicates when the standard protocol (SOP) or our modified protocol was used for library creation. The bar below the figure provides the average expected reads per sample, based on a NextSeq500 2× 150 bp High Output v. 1 run with 1/37 sequence allocation per library. Sample replicate numbers are given in parenthesis.

**Figure 3. Mock community profile comparisons.**
Correlation between the 1 ng SOP libraries (x-axes) and the low input DNA libraries (100, 10 and 1 pg, 100 fg; y-axes). Shown is the mean relative abundance of the 54 mock community members, based on reads aligned to the respective reference genomes.Inserts: show a subset of the relative abundances excluding the five most dominant organisms of the mock community. The mean standard deviation for each library is provided as error bars. The 100 fg libraries include four replicates (1, 2, 4, 5) out of five, omitting replicate 3 which was highly contaminated.

**Figure 4. Mock community assembly statistics.**
(A) Maximum contig size, (B) total assembly size, (C) number of contigs, and (D) N50 of the SOP and low input mock community libraries. Read files were subsample to five million read pairs. Gray bars show assemblies of all reads, red bars show assemblies after read duplicates were removed. Only contigs ≥ 1 kb were included in the analysis. All values are given as mean and standard deviation.

**Figure 5. Yield and quality assessment of marine samples.**
Reads are color coded based on the reference they aligned to, including the known contaminant Methylobacterium aerolatum (red) and the human genome (blue). The remaining reads are shown as unmapped (orange). The amount of DNA extracted with the modified extraction protocol is given as total DNA in 20 µl elution buffer (DNA extract). Number of cells (∼no. cells) was calculated based on an average DNA content of 1–10 fg per cell. The amount of input DNA for library preparation was measured for the SOP and the 1 ml libraries, and was estimated for the 100 and 10 µl samples based on the 1 ml sample measurements. The bar above the figure indicates when the standard protocol (SOP) or our modified protocol was used to create the libraries. All libraries were sequenced at an allocation of 1/37 of an Illumina NextSeq500 2×150 bp High Output v. 1 run. Sample replicate numbers are given in parenthesis.

**Figure 6. Abundance profiles of the marine microbial samples.**
Bacterial taxonomy was assigned based on 16S rRNA gene sequence detection of shotgun sequencing reads (graftM; see Methods). The normalized abundance is shown after square root transformation for all OTUs above the abundance threshold, resulting in a normalized read count (NR) from 0 to 800. The taxonomic assignment is provided down to the family level if available, otherwise the best available taxonomic rank is given.

**Figure 7. Marine communities profile correlations.**
Correlation coefficients are shown for the marine 10 L SOP, the 10 L filtered dilution, and the low input DNA libraries. The panels show the 16S rRNA gene based taxonomic profile correlations, and the KO-based functional profile correlations. The Pearson correlation coefficient is colour coded from zero (white) to one (dark blue).

**Figure 8. Profile analyses of marine sample replication.**
Replicate correlation plots of (A) 16S rRNA gene based taxonomic profiles and (B) functional KO based profiles. Samples with comparable DNA input amounts are connected via a grey box.

**Figure 9. Mean coefficient of variation for taxonomic marine community profiles.**
The mean coefficient of variation is applied to compare the 10 L dilutions (SOP 1 ng, 50 and 5 pg) against the low volume (1 ml, 100 and 10 ul) samples using 16S based taxonomic profiles. The X-axis shows the different amounts of input DNA and volumes for the low volume samples (upper row) and the 10 L filtration (SOP, and dilutions; lower row).

See this image and copyright information in PMC

References

1. Adey A, Morrison HG, Asan, Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, Caruccio NC, Zhang X, Shendure J. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biology. 2010;11(12):R119. doi: 10.1186/gb-2010-11-12-r119. - DOI - PMC - PubMed
1. Azam F. Microbial control of oceanic carbon flux: the plot thickens. Science. 1998;280(5364):694–696. doi: 10.1126/science.280.5364.694. - DOI
1. Azam F, Malfatti F. Microbial structuring of marine ecosystems. Nature Reviews Microbiology. 2007;5(10):782–791. doi: 10.1038/nrmicro1747. - DOI - PubMed
1. Bowers RM, Clum A, Tice H, Lim J, Singh K, Ciobanu D, Ngan CY, Cheng J-F, Tringe SG, Woyke T. Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community. BMC Genomics. 2015;16(1):856. doi: 10.1186/s12864-015-2063-6. - DOI - PMC - PubMed
1. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nature Methods. 2015;12(1):59–60. doi: 10.1038/nmeth.3176. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics

Affiliations

Validation of picogram- and femtogram-input DNA libraries for microscale metagenomics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources