Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Dec 17:4:288.
doi: 10.3389/fgene.2013.00288.

Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics

Affiliations
Review

Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics

Richard M Leggett et al. Front Genet. .

Abstract

The processes of quality assessment and control are an active area of research at The Genome Analysis Centre (TGAC). Unlike other sequencing centers that often concentrate on a certain species or technology, TGAC applies expertise in genomics and bioinformatics to a wide range of projects, often requiring bespoke wet lab and in silico workflows. TGAC is fortunate to have access to a diverse range of sequencing and analysis platforms, and we are at the forefront of investigations into library quality and sequence data assessment. We have developed and implemented a number of algorithms, tools, pipelines and packages to ascertain, store, and expose quality metrics across a number of next-generation sequencing platforms, allowing rapid and in-depth cross-platform Quality Control (QC) bioinformatics. In this review, we describe these tools as a vehicle for data-driven informatics, offering the potential to provide richer context for downstream analysis and to inform experimental design.

Keywords: NGS data analysis; QC; bioinformatics tools; contamination screening; quality assessment and improvement; quality control; run statistics; sequence analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data flow through the Primary Analysis Pipeline, focused on the Illumina platform.
Figure 2
Figure 2
Example kmer spectra. An initial peak around coverage 1 is indicative of sequencing errors. Two further peaks indicate heterozygosity.

References

    1. Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
    1. Atlassian (2013). Atlassian JIRA. Available online at: https://www.atlassian.com/software/jira
    1. Baird N., Etter P., Atwood T., Currey M., Shiver A., Lewis Z., et al. (2008). Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3:e3376 10.1371/journal.pone.0003376 - DOI - PMC - PubMed
    1. Baxter S., Davey J., Johnston J., Shelton A., Heckel D., Jiggins C., et al. (2011). Linkage mapping and comparative genomics using next-generation rad sequencing of a non-model organism. PLoS ONE 6:e19315 10.1371/journal.pone.0019315 - DOI - PMC - PubMed
    1. Burdett T. (2013). Conan2. Available online at: https://github.com/tburdett/Conan2