Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 May 27:5:157.
doi: 10.3389/fgene.2014.00157. eCollection 2014.

Quality control on the frontier

Affiliations
Review

Quality control on the frontier

Konrad H Paszkiewicz et al. Front Genet. .

Abstract

In the world of high-throughput sequencing there are numerous challenges to effective data quality control. There are no single quality metrics which are appropriate in all conditions. Here we detail the different open source software used at the Exeter Sequencing Service to provide generic quality control information, as well as more specific metrics for genomic and transcriptomic libraries run on Illumina platforms.

Keywords: Illumina; best practice; core-facility; quality control; sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Examples of Bioanalyser assay interpretation for a variety of RNAs. (A) Standard Eukaryotic RNA shows a 28S rRNA band at 4.5 kb that should be twice the intensity of the 18S rRNA band at 1.9 kb (human) resulting in a RIN = 8.0–10.0. Small peaks are sometimes present after the marker that represent 5S and 5.8S subunits, tRNAs and small RNA fragments about 100 bp; these are more obvious when using phenol or trizol exterection methods, QIagen columns will generally remove small RNAs. When degraded 28S RNA is reduced and more fragments are detected around the 18S RNA subunit resulting in RIN = 6.4, which is below the quality required for high throughput DNA sequencing. Invertebrate RNA results in fragmentation of the 28S rRNA into two bands that co-migrate with the 18S rRNA resulting in aberrant RIN score of <8.0 although the mRNA is unaffected and suitable for sequencing. Genomic DNA can skew the 28S RNA peak but can easily be remedied by RNAse-free DNase1 digestion. (B) Ribosomal RNA removal by isolation of poly-A-RNA assessed by Bioanalyser RNA assay.
Figure 2
Figure 2
Library fragment size distribution. Bioanalyser fluorescence values (black), realignment of paired end reads against reference genome or de novo assembly and adjusted by 126 bases to account for adapters (red) for libraries with average sizes of 360 bases (A), 550 bases (B), and 810 bases (C).
Figure 3
Figure 3
Basic read metrics extracted on a per-project basis. Basic read metrics extracted on a per-sample basis from the Illumina Demultiplex_stats.html file produced by the bcl2fastq pipeline. Additional information has also been added.
Figure 4
Figure 4
Overview of quality control metrics across multiple samples in a project. These plots are collated into a single HTML summary file for each project, making it easy to see any quality (A), nucleotide (B), or contaminant (C) issues at-a-glance.
Figure 5
Figure 5
Estimating required read sampling for contaminant checks. Ten million reads from an Illumina RNA-seq dataset was subsampled at various numbers of reads. The number of rRNA contaminant reads in this dataset was 1.86% when calculated over the full dataset. The absolute percentage difference at different sub-sample sizes was calculated for 500 replicates at each depth and the average shown. The error bars indicate the 95% confidence interval for the absolute percentage difference.
Figure 6
Figure 6
Taxonomy of unmapped reads assembled into contigs. A graphical representation of the number of contigs mapping to each level of the NCBI Taxonomy. The colors represent the number of contigs mapping to each branch.
Figure 7
Figure 7
Evaluating ERCC spike-in control results. Examples of cDNA libraries prepared from Arabidopsis thaliana RNA and Mouse infected with Burkholderia pseudomallei RNA containing ERCC Spike-In. Spike-In mix was added to total RNA before preparation of the sequencing library. (A) A. thaliana poly-A RNA was isolated, and a sequencing library prepared using ScriptSeq v2 (Epicentre). (B) Trial B. pseudomallei sequencing library was prepared from RNA extracted from liver of a B. pseudomallei infected mouse; prokaryotic RNA was enriched using Microbe enrich kit (Invitrogen) and bacterial ribosomal RNA was reduced using MicrobeExpress (Invitrogen) before ScriptSeq v2 sequencing library preparation. ERCC spike in mix is polyadenlyated and the majority would be expected to have been removed during the library preparation resulting in a poor correlation and lower limit of detection, thereby contributing to the protocol development. Libraries were processed and sequenced on the Illumina HiSeq2500. The data were normalized to reads per kilobase of exon model per million mapped reads (RPKM) and filtered using a sensitivity threshold set arbitrarily at 1 RPKM (shown by the horizontal dotted line in at log2 RPKM = 0; Mortazavi et al., 2008).
Figure 8
Figure 8
Transcript coverage using RNASeqQC. An example plot from RNASeqQC detailing mean coverage of top 20 transcript abundances for 6 samples. A clear bias can be seen at both 3' and 5' ends which may impact on downstream analysis.

Similar articles

Cited by

References

    1. Aird D., Ross M. G., Chen W.-S., Danielsson M., Fennell T., Russ C., et al. (2011). Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12:R18 10.1186/gb-2011-12-2-r18 - DOI - PMC - PubMed
    1. Andrews S. (2010a). FASTQC Package. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
    1. Andrews S. (2010b). Fastq-screen Package. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/
    1. Azofeifa D. E., Arguedas H. J., Vargas W. E. (2012) Optical properties of chitin and chitosan biopolymers with application to structural color analysis. Opt. Mater. 35, 175–183 10.1016/j.optmat.2012.07.024 - DOI
    1. Cox M. P., Peterson D. A., Biggs P. J. (2010). SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11:485 10.1186/1471-2105-11-485 - DOI - PMC - PubMed

LinkOut - more resources