Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 30:10:18.
doi: 10.1186/1944-3277-10-18. eCollection 2015.

Large-scale contamination of microbial isolate genomes by Illumina PhiX control

Affiliations

Large-scale contamination of microbial isolate genomes by Illumina PhiX control

Supratim Mukherjee et al. Stand Genomic Sci. .

Abstract

With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world's biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community.

Keywords: Comparative genomics; Contamination; Next-generation sequencing; PhiX.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Genome size and contaminated sequence length (inset) of PhiX contaminated taxa.

References

    1. Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN. et al.A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature. 2009;462(7276):1056–60. doi: 10.1038/nature08656. doi:10.1038/nature08656. - DOI - PMC - PubMed
    1. MacLean D, Jones JDG, Studholme DJ. Application of ‘next-generation’ sequencing technologies to microbial genetics. Nat Rev Micro. 2009;7(4):287–96. doi:10.1038/nrmicro2088. - PubMed
    1. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J-F. et al.Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013;499(7459):431–7. doi: 10.1038/nature12352. doi:10.1038/nature12352. - DOI - PubMed
    1. Pagani I, Liolios K, Jansson J, Chen IMA, Smirnova T, Nosrat B. et al.The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012;40(D1):D571–9. doi: 10.1093/nar/gkr1100. doi:10.1093/nar/gkr1100. - DOI - PMC - PubMed
    1. Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A. 1990;87:4576–9. doi: 10.1073/pnas.87.12.4576. - DOI - PMC - PubMed