Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Nov;15(6):879-89.
doi: 10.1093/bib/bbt069. Epub 2013 Sep 24.

Three-stage quality control strategies for DNA re-sequencing data

Review

Three-stage quality control strategies for DNA re-sequencing data

Yan Guo et al. Brief Bioinform. 2014 Nov.

Abstract

Advances in next-generation sequencing (NGS) technologies have greatly improved our ability to detect genomic variants for biomedical research. In particular, NGS technologies have been recently applied with great success to the discovery of mutations associated with the growth of various tumours and in rare Mendelian diseases. The advance in NGS technologies has also created significant challenges in bioinformatics. One of the major challenges is quality control of the sequencing data. In this review, we discuss the proper quality control procedures and parameters for Illumina technology-based human DNA re-sequencing at three different stages of sequencing: raw data, alignment and variant calling. Monitoring quality control metrics at each of the three stages of NGS data provides unique and independent evaluations of data quality from differing perspectives. Properly conducting quality control protocols at all three stages and correctly interpreting the quality control results are crucial to ensure a successful and meaningful study.

Keywords: FASTQ; alignment; quality control; sequencing; variant calling.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The percentages of reads assigned to different categories for (A) SureSelect (v2), (B) TrueSeq and (C) the SeqCap EZ methods of exome sequencing. In all cases, the largest category of reads consists of the targeted genomic regions, but a large fraction of the reads are off target. The categories shown are the reads that map to exons that were not part of the target set, intergenic regions, mtDNA, introns and finally reads that do not map to any part of the human reference sequence. (D) The total number of bases covered at >10 depth that map to exons (both targeted and non-targeted exons), introns and intergenic regions for three methods of exome sequencing. These numbers should be compared with the full human genome size of approximately 3 billion base pairs.
Figure 2:
Figure 2:
The Ti/Tv ratio is computed as the number of transition SNPs divided by the number of transversion SNPs. Transitions involve interchanges of nucleotides of similar shapes: two-ring purines (A←→G) or one-ring pyrimidines (C←→T). Transversions involve interchanges of one-ring and two-ring structures (A←→C, A←→T, G←→T, G←→C). Even though the number of possible transversions is twice as many as the number of possible transitions, leading to a Ti/Tv ratio of 0.5 if mutations occurred at equal rates, the actual Ti/Tv ratio differs by genomic regions.
Figure 3:
Figure 3:
Proof of the principle that the heterozygosity to non-reference homozygosity ratio equals 2 for whole-genome sequencing data.
Figure 4:
Figure 4:
Overall workflow of quality control in DNA sequencing data.

References

    1. Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9. - PMC - PubMed
    1. Patel RK, Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One. 2012;7:e30619. - PMC - PubMed
    1. Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011;27:863–4. - PMC - PubMed
    1. Zhou Q, Su X, Wang A, et al. QC-Chain: fast and holistic quality control method for next-generation sequencing data. PLoS One. 2013;8:e60234. - PMC - PubMed
    1. Guo Y, Long J, He J, et al. Exome sequencing generates high quality data in non-target regions. BMC Genomics. 2012;13:194. - PMC - PubMed

MeSH terms