Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011:1:55.
doi: 10.1038/srep00055. Epub 2011 Aug 5.

Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions

Affiliations
Comparative Study

Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions

Weixin Wang et al. Sci Rep. 2011.

Abstract

The rapid development of next generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate SNP calls are two major challenges in taking full advantage of NGS. In this article, we reviewed the current software tools for mapping and SNP calling, and evaluated their performance on samples from The Cancer Genome Atlas (TCGA) project. We found that BWA and Bowtie are better than the other alignment tools in comprehensive performance for Illumina platform, while NovoalignCS showed the best overall performance for SOLiD. Furthermore, we showed that next-generation sequencing platform has significantly lower coverage and poorer SNP-calling performance in the CpG islands, promoter and 5'-UTR regions of the genome. NGS experiments targeting for these regions should have higher sequencing depth than the normal genomic region.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The relationship between sequence fold and genomic coverage.
Length of colour bar represents the percent of bases with corresponding depth in the whole genome under corresponding volume of sequencing bases.
Figure 2
Figure 2. Coverage comparisons for different genetic regions at ten folds coverage.
P-value (all are less than 2.2e-16) for t-test through bootstrap shows the significant poorer coverage of CpG-island region compared with genomic background or gene region. Meanwhile, the promoter and 5′UTR region are both significantly under-covered. (One star: p-value<0.05, two stars: p-value<0.01, three stars: p-value<0.001).
Figure 3
Figure 3. The relationship of the number of probes covered and genomic sequence fold (total 583891 SNP probes)
Figure 4
Figure 4. Comparison of SNP calling qualities (AUCs) of three software tools at different depths.
Figure 5
Figure 5. AUC (area under the curve) comparison for different genetic regions.
CpG-island region has significantly poorer performance than genomic background (p-value = 0.000972) or gene region (p-value = 0.0003607). Promoter (p-value = 0.00873) and 5′UTR (p-value = 0.00946) region shows similar pattern. Gene-region also reach a little bit lower performance (p-value = 0.0004641).

Similar articles

Cited by

References

    1. Flicek P. & Birney E. Sense from sequence reads: methods for alignment and assembly (vol 6, pg S6, 2009). Nat Methods 7, 479–479 (2010). - PubMed
    1. Mardis E. R. The impact of next-generation sequencing technology on genetics. Trends Genet 24, 133–141 (2008). - PubMed
    1. Mardis E. R. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 9, 387–402 (2008). - PubMed
    1. Sanger F., Nicklen S. & Coulson A. R. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74, 5463–5467 (1977). - PMC - PubMed
    1. Bonetta L. Genome sequencing in the fast lane. Nat Methods 3, 141–147 (2006).

Publication types