Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Oct 10:6:245.
doi: 10.1186/1471-2105-6-245.

Extension of Lander-Waterman theory for sequencing filtered DNA libraries

Affiliations

Extension of Lander-Waterman theory for sequencing filtered DNA libraries

Michael C Wendl et al. BMC Bioinformatics. .

Abstract

Background: The degree to which conventional DNA sequencing techniques will be successful for highly repetitive genomes is unclear. Investigators are therefore considering various filtering methods to select against high-copy sequence in DNA clone libraries. The standard model for random sequencing, Lander-Waterman theory, does not account for two important issues in such libraries, discontinuities and position-based sampling biases (the so-called "edge effect"). We report an extension of the theory for analyzing such configurations.

Results: The edge effect cannot be neglected in most cases. Specifically, rates of coverage and gap reduction are appreciably lower than those for conventional libraries, as predicted by standard theory. Performance decreases as read length increases relative to island size. Although opposite of what happens in a conventional library, this apparent paradox is readily explained in terms of the edge effect. The model agrees well with prototype gene-tagging experiments for Zea mays and Sorghum bicolor. Moreover, the associated density function suggests well-defined probabilistic milestones for the number of reads necessary to capture a given fraction of the gene space. An exception for applying standard theory arises if sequence redundancy is less than about 1-fold. Here, evolution of the random quantities is independent of library gaps and edge effects. This observation effectively validates the practice of using standard theory to estimate the genic enrichment of a library based on light shotgun sequencing.

Conclusion: Coverage performance using a filtered library is significantly lower than that for an equivalent-sized conventional library, suggesting that directed methods may be more critical for the former. The proposed model should be useful for analyzing future projects.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of the covering process for the conventional continuous library (top) versus the filtered discontinuous library (bottom).
Figure 2
Figure 2
Coverage evolution for both the discontinuous island model and the LW "super-island" (LWSI) model.
Figure 3
Figure 3
Evolution of gap census for both the discontinuous island model and the LWSI model.
Figure 4
Figure 4
Evolution of average contig length for both the discontinuous island model and the LWSI model.
Figure 5
Figure 5
Comparison of Thm. 5 to experimental gene-tagging results in sorghum and maize.
Figure 6
Figure 6
Tail probabilities for tagging various fractions of the gene space in Sorghum bicolor [14].
Figure 7
Figure 7
Island coordinate system and nomenclature.

References

    1. Tettelin H, Nelson KE, Paulsen IT, Eisen JA, Read TD, Peterson S, Heidelberg J, DeBoy RT, Haft DH, Dodson RJ, Durkin AS, Gwinn M, Kolonay JF, Nelson WC, Peterson JD, Umayam LA, White O, Salzberg SL, Lewis MR, Radune D, Holtzapple E, Khouri H, Wolf AM, Utterback TR, Hansen CL, McDonald LA, Feldblyum TV, Angiuoli S, Dickinson T, Hickey EK, Holt IE, Loftus BJ, Yang F, Smith HO, Venter JC, Dougherty BA, Morrison DA, Hollingshead SK, Fraser CM. Complete Genome Sequence of a Virulent Isolate of Streptococcus pneumoniae. Science. 2001;293:498–506. doi: 10.1126/science.1061217. - DOI - PubMed
    1. International Human Genome Sequencing Consortium Finishing the Euchromatic Sequence of the Human Genome. Nature. 2004;431:931–945. doi: 10.1038/nature03001. - DOI - PubMed
    1. SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, Bennetzen JL. Nested Retrotransposons in the Intergenic Regions of the Maize Genome. Science. 1996;274:765–768. doi: 10.1126/science.274.5288.765. - DOI - PubMed
    1. Palmer LE, Rabinowicz PD, O'Shaughnessy AL, Balija VS, Nascimento LU, Dike S, de la Bastide M, Martienssen RA, McCombie WR. Maize Genome Sequencing by Methylation Filtration. Science. 2003;302:2115–2117. doi: 10.1126/science.1091265. - DOI - PubMed
    1. Bennetzen JL, Chandler VL, Schnable P. National Science Foundation-Sponsored Workshop Report. Maize Genome Sequencing Project. Plant Physiology. 2001;127:1572–1578. doi: 10.1104/pp.127.4.1572. - DOI - PMC - PubMed

Publication types

LinkOut - more resources