Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 May 16:9:239.
doi: 10.1186/1471-2105-9-239.

Aspects of coverage in medical DNA sequencing

Affiliations

Aspects of coverage in medical DNA sequencing

Michael C Wendl et al. BMC Bioinformatics. .

Abstract

Background: DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations.

Results: We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8x to 10x redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26x and 21x, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21x value for normal samples is essentially a constant.

Conclusion: Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Traditional haploid coverage model [15, 16] versus diploid medical sequencing coverage results for minimum number of covering reads φ ∈ {1, 2, 3, 4, 5}. The figure also shows an additional curve that replots the diploid φ = 2 curve, except where abscissa values are scaled by one-half. This aspect is relevant to the discussion of why the redundancies for φ = 1 and φ = 2 do not differ by a factor of two. Coverage progressions for φ ∈ {1, 2} are also shown for the recent Illumina resequencing of C. elegans by Hillier et al. [28]. These points represent average coverages over all chromosome pairs, while their error bars show the observed minima and maxima. Simulation data for φ = 1 on a 20 kb fragment using 250 bp reads [20] are also shown. Points and error bars represent the averages and extrema, respectively, of 250 simulations.
Figure 2
Figure 2
Haploid and diploid results for expected coverage values of at least 0.9975. This is a greatly – magnified view of the top quarter – percent of the ordinate range in Fig. 1. Vertical lines demarcate the typical BAC calibration neighborhood of 6 ≤ ρ ≤ 10. The scaling process is demonstrated graphically for diploid sequencing (φ = 1) based on haploid sequencing at ρ = 8.
Figure 3
Figure 3
Diagrammatic synopsis of the intersection probability. Paired coverage distributions, plotted at differences of one unit of redundancy, begin to coalesce as a project evolves. The intersection probability is the area of the overlap (shaded).
Figure 4
Figure 4
Expected coverage for aneuploid chromosome configurations for minimum number of covering reads φ ∈ {2, 3}.

Similar articles

Cited by

References

    1. Strausberg RL, Simpson AJG, Wooster R. Sequence-Based Cancer Genomics: Progress, Lessons and Opportunities. Nature Reviews Genetics. 2003;4:409–418. doi: 10.1038/nrg1085. - DOI - PubMed
    1. Ley TJ, Minx PJ, Walter MJ, Ries RE, Sun H, McLellan M, DiPersio JF, Link DC, Tomasson MH, Graubert TA, McLeod H, Khoury H, Watson M, Shannon W, Trinkaus K, Heath S, Vardiman JW, Caligiuri MA, Bloomfield CD, Milbrandt JD, Mardis ER, Wilson RK. A Pilot Study of High-Throughput, Sequence-Based Mutational Profiling of Primary Human Acute Myeloid Leukemia Cell Genomes. Proceedings of the National Academy of Sciences. 2003;100:14275–14280. doi: 10.1073/pnas.2335924100. - DOI - PMC - PubMed
    1. Wilson RK, Ley TJ, Cole FS, Milbrandt JD, Clifton S, Fulton L, Fewell G, Minx P, Sun H, McLellan M, Pohl C, Mardis ER. Mutational Profiling in the Human Genome. Cold Spring Harbor Symposia on Quantitative Biology. 2003;68:23–29. doi: 10.1101/sqb.2003.68.23. - DOI - PubMed
    1. Rand V, Huang J, Stockwell T, Ferriera S, Buzko O, Levy S, Busam D, Li K, Edwards JB, Eberhart C, Murphy KM, Tsiamouri A, Beeson K, Simpson AJG, Venter JC, Riggins GJ, Strausberg RL. Sequence Survey of Receptor Tyrosine Kinases Reveals Mutations in Glioblastomas. Proceedings of the National Academy of Sciences. 2005;102:14344–14349. doi: 10.1073/pnas.0507200102. - DOI - PMC - PubMed
    1. Ma PC, Zhang X, Wang ZJ. High-Throughput Mutational Analysis of the Human Cancer Genome. Pharmacogenomics. 2006;7:597–612. doi: 10.2217/14622416.7.4.597. - DOI - PubMed

Publication types

LinkOut - more resources