Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Aug;135(4):2040-5.
doi: 10.1104/pp.104.041640. Epub 2004 Aug 6.

Types and frequencies of sequencing errors in methyl-filtered and high c0t maize genome survey sequences

Affiliations
Comparative Study

Types and frequencies of sequencing errors in methyl-filtered and high c0t maize genome survey sequences

Yan Fu et al. Plant Physiol. 2004 Aug.

Abstract

The Maize Genome Sequencing Consortium has deposited into GenBank more than 850,000 maize (Zea mays) genome survey sequences (GSSs) generated via two gene enrichment strategies, methylation filtration and high-C(0)t (HC) fractionation. These GSSs are a valuable resource for generating genome assemblies and the discovery of single nucleotide polymorphisms and nearly identical paralogs. Based on the rate of mismatches between 183 GSSs (105 methylation filtration + 78 HC) and 10 control genes, the rate of sequencing errors in these GSSs is 2.3 x 10(-3). As expected many of these errors were derived from insufficient vector trimming and base-calling errors. Surprisingly, however, some errors were due to cloning artifacts. These G.C to A.T transitions are restricted to HC clones; over 40% of HC clones contain at least one such artifact. Because it is not possible to distinguish the cloning artifacts from biologically relevant polymorphisms, HC sequences should be used with caution for the discovery of single nucleotide polymorphisms or paramorphisms. The average rate of sequencing errors was reduced 6-fold (to 3.6 x 10(-4)) by applying more stringent trimming parameters. This trimming resulted in the loss of only 11% of the bases (15,469/144,968). Due to redundancy among GSSs this more stringent trimming reduced coverage of promoters, exons, and introns by only 0%, 1%, and 4%, respectively. Hence, at the cost of a very modest loss of gene coverage, the quality of these maize GSSs can approach Bermuda standards, even prior to assembly.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
GSS coverage of the rf2a gene. Black boxes indicate experimentally validated exons. The two vertical dotted lines define the interval used for the coverage calculations presented in Table II. Solid green lines designate GSSs with 5′ -> 3′ orientations, while red dotted lines designate GSSs with 3′ -> 5′ orientations. The black, blue, and red dots on GSSs indicate Class I, Class II, and Class III errors, respectively. MF and HC GSSs are located above and below the gene, respectively. The gray box indicates the approximately 4.8-kb region (positions 8,430–13,230) masked prior to the BLAST search, which contains two open reading frames (positions 8,430–12,224 and 11,869–13,230) of a repetitive copia-like retrotranspon, DON QUIXOTE.

Similar articles

Cited by

References

    1. Bailey J, Gu Z, Clark R, Reinert K, Samonte R, Schwartz S, Adams M, Myers E, Li P, Eichler E (2002) Recent segmental duplications in the human genome. Science 297: 1003–1007 - PubMed
    1. Bailey J, Yavor A, Massa H, Trask B, Eichler E (2001) Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11: 1005–1017 - PMC - PubMed
    1. Brown KR, Weatherdon KL, Galligan CL, Skalski V (2002) A nuclear 3′-5′ exonuclease proofreads for the exonuclease-deficient DNA polymerase alpha. DNA Repair (Amst) 1: 795–810 - PubMed
    1. Chou HH, Holmes MH (2001) DNA sequence quality trimming and vector removal. Bioinformatics 17: 1093–1104 - PubMed
    1. Emrich SJ, Aluru S, Fu Y, Wen TJ, Narayanan M, Guo L, Ashlock D, Schnable PS (2004) A strategy for assembling the maize (Zea mays L.) genome. Bioinformatics 20: 140–147 - PubMed

Publication types