Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 May;132(1):84-91.
doi: 10.1104/pp.102.019422.

Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data

Affiliations

Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data

Jacqueline Batley et al. Plant Physiol. 2003 May.

Abstract

We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Abundance of maize EST contigs containing candidate SNPs in relation to contig size. The frequency of contigs that contained SNPs was calculated for all alignments of increasing numbers of sequence reads.
Figure 2
Figure 2
Abundance of candidate SNPs identified within contigs in relation to contig size for maize EST data. The mean SNP abundance was calculated for all alignments of increasing numbers of sequence reads.
Figure 3
Figure 3
A measurement of SNP redundancy score in relation to contig size for maize EST data. The mean SNP redundancy score was calculated for all candidate SNPs identified in alignments of increasing numbers of sequence reads.
Figure 4
Figure 4
AutoSNP summary report 246. This report depicts nine candidate SNPs, identifying their base position in the sequence alignment along with measures of confidence of SNP validity. The key relates the aligned sequences to original GenBank sequence identification and also identifies the maize line (where available) derived from the GenBank annotation. The SNP redundancy score measures the minimum number of sequences that represent a polymorphism. The cosegregation score is a measure of the number of SNPs in the alignment that share the same pattern of polymorphism between aligned sequences. The weighted cosegregation score corrects for missing data in the EST alignments that may otherwise bias the cosegregation score. In this example, all SNPs were verified as true polymorphisms.

Similar articles

Cited by

References

    1. Adams MD, Kerlavage AR, Fleischmann RD, Fuldner RA, Bult CJ, Lee NH, Kirkness EF, Weinstock KG, Gocayne JD, White O et al. Initial assessment of human gene diversity and expression patterns based upon 83-million nucleotides of cDNA sequence. Nature. 1995;377:3. - PubMed
    1. Bennetzen JL, Chandler VL, Schnable P. National Science Foundation-Sponsored Workshop Report: Maize Genome Sequencing Project. Plant Physiol. 2002;127:1572–1578. - PMC - PubMed
    1. Bhattramakki D, Dolan M, Hanafey M, Wineland R, Vaske D, Register JC, III, Tingey SV, Rafalski A. Insertion-deletion polymorphisms in 3′ regions of maize genes occur frequently and can be used as highly informative genetic markers. Plant Mol Biol. 2002;48:539–547. - PubMed
    1. Buetow KH, Edmonson MN, Cassidy AB. Reliable identification of large numbers of candidate SNPs from public EST data. Nat Genet. 1999;21:323–325. - PubMed
    1. Burke J, Davison D, Hide W. d2_cluster: a validated method for clustering EST and full-length cDNA sequences. Genome Res. 1999;9:1135–1142. - PMC - PubMed

Publication types

LinkOut - more resources