Mining SNPs from EST databases
Abstract
There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs.
Figures



Similar articles
-
Mining SNPs from EST sequences using filters and ensemble classifiers.Genet Mol Res. 2010 May 4;9(2):820-34. doi: 10.4238/vol9-2gmr765. Genet Mol Res. 2010. PMID: 20449815
-
Single nucleotide polymorphism hunting in cyberspace.Hum Mutat. 1998;12(4):221-5. doi: 10.1002/(SICI)1098-1004(1998)12:4<221::AID-HUMU1>3.0.CO;2-I. Hum Mutat. 1998. PMID: 9744471 Review.
-
SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation.Bioinformatics. 2007 Jul 1;23(13):i387-91. doi: 10.1093/bioinformatics/btm192. Bioinformatics. 2007. PMID: 17646321
-
Using mtDNA sequences to estimate SNP parameters in ESTs.Anim Biotechnol. 2008;19(3):166-77. doi: 10.1080/10495390802170916. Anim Biotechnol. 2008. PMID: 18607789
-
The cDNA sequencing project.Methods Mol Biol. 2006;346:31-49. doi: 10.1385/1-59745-144-4:31. Methods Mol Biol. 2006. PMID: 16957283 Review.
Cited by
-
Identification of RNA editing sites in the SNP database.Nucleic Acids Res. 2005 Aug 12;33(14):4612-7. doi: 10.1093/nar/gki771. Print 2005. Nucleic Acids Res. 2005. PMID: 16100382 Free PMC article.
-
An interactive bovine in silico SNP database (IBISS).Mamm Genome. 2004 Oct;15(10):819-27. doi: 10.1007/s00335-004-2382-4. Mamm Genome. 2004. PMID: 15520884
-
A cSNP map and database for human chromosome 21.Genome Res. 2001 Feb;11(2):300-7. doi: 10.1101/gr.164901. Genome Res. 2001. PMID: 11157793 Free PMC article.
-
Estimation of population heterozygosity and library construction-induced mutation rate from expressed sequence tag collections.Genetics. 2007 May;176(1):711-4. doi: 10.1534/genetics.106.063610. Epub 2006 Dec 18. Genetics. 2007. PMID: 17179075 Free PMC article.
-
Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries.BMC Bioinformatics. 2005 Dec 13;6:300. doi: 10.1186/1471-2105-6-300. BMC Bioinformatics. 2005. PMID: 16351717 Free PMC article.
References
-
- Adams MD, Soares MB, Kerlavage AR, Fields C, Venter JC. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat Genet. 1993;4:373–380. - PubMed
-
- Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SP. Accessing genetic information with high-density DNA arrays. Science. 1996;274:610–614. - PubMed
-
- Cl’ement K, Vaisse C, Lahlow N, Cabrol S, Pelloux V, Cassuto D, Gourmelen M, Dina C, Chambaz J, Lacorte JM, et al. A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction. Nature. 1998;392:398–401. - PubMed
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials