Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comment
. 2019 Apr 16;7(1):62.
doi: 10.1186/s40168-019-0678-6.

Towards precision quantification of contamination in metagenomic sequencing experiments

Affiliations
Comment

Towards precision quantification of contamination in metagenomic sequencing experiments

M S Zinter et al. Microbiome. .

Abstract

Metagenomic next-generation sequencing (mNGS) experiments involving small amounts of nucleic acid input are highly susceptible to erroneous conclusions resulting from unintentional sequencing of occult contaminants, especially those derived from molecular biology reagents. Recent work suggests that, for any given microbe detected by mNGS, an inverse linear relationship between microbial sequencing reads and sample mass implicates that microbe as a contaminant. By associating sequencing read output with the mass of a spike-in control, we demonstrate that contaminant nucleic acid can be quantified in order to identify the mass contributions of each constituent. In an experiment using a high-resolution (n = 96) dilution series of HeLa RNA spanning 3-logs of RNA mass input, we identified a complex set of contaminants totaling 9.1 ± 2.0 attograms. Given the competition between contamination and the true microbiome in ultra-low biomass samples such as respiratory fluid, quantification of the contamination within a given batch of biological samples can be used to determine a minimum mass input below which sequencing results may be distorted. Rather than completely censoring contaminant taxa from downstream analyses, we propose here a statistical approach that allows separation of the true microbial components from the actual contribution due to contamination. We demonstrate this approach using a batch of n = 97 human serum samples and note that despite E. coli contamination throughout the dataset, we are able to identify a patient sample with significantly more E. coli than expected from contamination alone. Importantly, our method assumes no prior understanding of possible contaminants, does not rely on any prior collection of environmental or reagent-only sequencing samples, and does not censor potentially clinically relevant taxa, thus making it a generalized approach to any kind of metagenomic sequencing, for any purpose, clinical or otherwise.

Keywords: DNA; DNA contamination; Metagenomics; Microbiota; Regression analysis; Sequence analysis.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Methods and protocols for the study were approved by the Committee for the Protection of Human Subjects within the Health and Human Services Agency of the State of California (#12-090702).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Contaminant sequencing reads are inversely proportional to sample mass. For each of n = 32 HeLa input masses (present in triplicate), sequencing reads for the total ERCC set (n = 92 different transcripts) are normalized per million (rpm) and presented in green; sequencing rpm aligning to the E. coli genome are presented in blue; and sequencing rpm aligning to the S. cerevisiae genome are presented in red. The linear regressions associating sample input mass with ERCC, E. coli, and S. cerevisiae are described with the adjusted R2 and p value
Fig. 2
Fig. 2
Precision quantification of microbial contamination in sequencing experiments. For each of n = 32 HeLa input masses (measured in triplicate), microbial contaminants were identified if the inverse linear relationship associating log10-transformed rpm of any given microbe with the log10-transformed sample mass demonstrated an adjusted R2 ≥ 0.7. By solving the equation contaminant mass/ERCC mass = contaminant reads/ERCC reads, the estimated mass of each contaminant in each sample was calculated. The top contaminating taxa were E. coli (2.59 ± 0.67 ag), S. cerevisiae (1.02 ± 0.30 ag), S. maltophilia (0.61 ± 0.49 ag), unspecified cloning vector (0.43 ± 0.17 ag), and A. xylosoxidans (0.40 ± 0.27 ag), respectively. The estimated mass of all contaminants (excluding human and low-quality reads) in each sample was 9.1 ± 2.0 ag
Fig. 3
Fig. 3
Identification of outliers among contaminant microbes. Left: for each of n = 97 serum sample RNA input masses, sequencing reads for the total ERCC set (n = 92 different transcripts) are normalized per million (rpm) and presented in green; sequencing rpm aligning to the E. coli genome are presented in blue; and sequencing rpm aligning to the S. maltophilia genome are presented in grey. The linear regressions associating sample input mass with ERCC, E. coli, and S. cerevisiae are described with the adjusted R2 and p value. Right: a histogram of the studentized residual for each observation informing the linear regression between log10-transformed sequencing reads (E. coli in blue, S. maltophilia in grey) and log10-transformed sample input mass. Studentized residuals approximate a near-normal distribution between − 2 and + 2 such that outliers can be rapidly identified (red)

Comment on

References

    1. Adams RI, Bateman AC, Bik HM, Meadow JF. Microbiota of the indoor environment: a meta-analysis. Microbiome. 2015;3:49. doi: 10.1186/s40168-015-0108-3. - DOI - PMC - PubMed
    1. Weiss S, Amir A, Hyde ER, Metcalf JL, Song SJ, Knight R. Tracking down the sources of experimental contamination in microbiome studies. Genome Biol. 2014;15(12):564. doi: 10.1186/s13059-014-0564-2. - DOI - PMC - PubMed
    1. Sinha R, Abnet CC, White O, Knight R, Huttenhower C. The microbiome quality control project: baseline study design and future directions. Genome Biol. 2015;16:276. doi: 10.1186/s13059-015-0841-8. - DOI - PMC - PubMed
    1. Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, Turner P, Parkhill J, Loman NJ, Walker AW. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 2014;12:87. doi: 10.1186/s12915-014-0087-z. - DOI - PMC - PubMed
    1. Kim D, Hofstaedter CE, Zhao C, Mattei L, Tanes C, Clarke E, Lauder A, Sherrill-Mix S, Chehoud C, Kelsen J, Conrad M, Collman RG, Baldassano R, Bushman FD, Bittinger K. Optimizing methods and dodging pitfalls in microbiome research. Microbiome. 2017;5(1):52. doi: 10.1186/s40168-017-0267-5. - DOI - PMC - PubMed