Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP
- PMID: 16251470
- PMCID: PMC1310648
- DOI: 10.1101/gr.4297805
Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP
Abstract
In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's dbSNP public database. More than 2.7 million SNPs in the database have genotype information. This data provides an invaluable resource for understanding the structure of human variation and the design of genetic association studies. The genotypes deposited to dbSNP are unphased, and thus, the haplotype information is unknown. We applied the phasing method HAP to obtain the haplotype information, block partitions, and tag SNPs for all publicly available genotype data and deposited this information into the dbSNP database. We also deposited the orthologous chimpanzee reference sequence for each predicted haplotype block computed using the UCSC BLASTZ alignments of human and chimpanzee. Using dbSNP, researchers can now easily perform analyses using multiple genotype data sets from the same genomic regions. Dense and sparse genotype data sets from the same region were combined to show that the number of common haplotypes is significantly underestimated in whole genome data sets, while the predicted haplotypes over the common SNPs are consistent between studies. To validate the accuracy of the predictions, we bench-marked HAP's running time and phasing accuracy against PHASE. Although HAP is slightly less accurate than PHASE, HAP is over 1000 times faster than PHASE, making it suitable for application to the entire set of genotypes in dbSNP.
Figures



Similar articles
-
Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies.Hum Mutat. 2010 Jan;31(1):67-73. doi: 10.1002/humu.21137. Hum Mutat. 2010. PMID: 19877174 Free PMC article.
-
Haplotype reconstruction from genotype data using Imperfect Phylogeny.Bioinformatics. 2004 Aug 12;20(12):1842-9. doi: 10.1093/bioinformatics/bth149. Epub 2004 Feb 26. Bioinformatics. 2004. PMID: 14988101
-
Tag SNP selection in genotype data for maximizing SNP prediction accuracy.Bioinformatics. 2005 Jun;21 Suppl 1:i195-203. doi: 10.1093/bioinformatics/bti1021. Bioinformatics. 2005. PMID: 15961458
-
[Analysis and application of SNP and haplotype in the human genome].Yi Chuan Xue Bao. 2005 Aug;32(8):879-89. Yi Chuan Xue Bao. 2005. PMID: 16231744 Review. Chinese.
-
Definition and clinical importance of haplotypes.Annu Rev Med. 2005;56:303-20. doi: 10.1146/annurev.med.56.082103.104540. Annu Rev Med. 2005. PMID: 15660514 Review.
Cited by
-
Shape-IT: new rapid and accurate algorithm for haplotype inference.BMC Bioinformatics. 2008 Dec 16;9:540. doi: 10.1186/1471-2105-9-540. BMC Bioinformatics. 2008. PMID: 19087329 Free PMC article.
-
Identification of superior haplotypes and candidate gene for seed size-related traits in soybean (Glycine max L.).Mol Breed. 2024 Dec 22;45(1):3. doi: 10.1007/s11032-024-01525-1. eCollection 2025 Jan. Mol Breed. 2024. PMID: 39717350 Free PMC article.
-
Neuropeptide Y(1) Receptor NPY1R discovery of naturally occurring human genetic variants governing gene expression in cella as well as pleiotropic effects on autonomic activity and blood pressure in vivo.J Am Coll Cardiol. 2009 Sep 1;54(10):944-54. doi: 10.1016/j.jacc.2009.05.035. J Am Coll Cardiol. 2009. PMID: 19712806 Free PMC article.
-
Local haplotyping reveals insights into the genetic control of flowering time variation in wild and domesticated soybean.Plant Genome. 2024 Dec;17(4):e20528. doi: 10.1002/tpg2.20528. Epub 2024 Nov 7. Plant Genome. 2024. PMID: 39510980 Free PMC article.
-
Leveraging the HapMap correlation structure in association studies.Am J Hum Genet. 2007 Apr;80(4):683-91. doi: 10.1086/513109. Epub 2007 Mar 2. Am J Hum Genet. 2007. PMID: 17357074 Free PMC article.
References
-
- Cargill, M., Altshuler, D., Ireland, J., Sklar, P., Ardlie, K., Patil, N., Shaw, N., Lane, C.R., Lim, EP., Kalyanaraman, N., et al. 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22: 231–238. - PubMed
-
- Carlson, C.S., Eberle, M.A., Kruglyak, L., and Nickerson, D.A. 2004. Mapping complex disease loci in whole-genome association studies. Nature 429: 446–452. - PubMed
-
- Collins, F.S., Brooks, L.D., and Chakravarti, A. 1998. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8: 1229–1231. - PubMed
-
- Crawford, D.C., Carlson, C.S., Rieder, M.J., Carrington, D.P., Yi, Q., Smith, J.D., Eberle, M.A., Kruglyak, L., and Nickerson, D.A. 2004. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am. J. Hum. Genet. 74: 610–622. - PMC - PubMed
-
- Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. 2001. High-resolution haplotype structure in the human genome. Nat. Genet. 29: 229–232. - PubMed
Web site references
-
- http://innateimmunity.net/; Innate Immunity PGA. NHLBI program in genomic applications.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources