Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Nov;15(11):1594-600.
doi: 10.1101/gr.4297805.

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP

Affiliations
Comparative Study

Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP

Noah A Zaitlen et al. Genome Res. 2005 Nov.

Abstract

In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's dbSNP public database. More than 2.7 million SNPs in the database have genotype information. This data provides an invaluable resource for understanding the structure of human variation and the design of genetic association studies. The genotypes deposited to dbSNP are unphased, and thus, the haplotype information is unknown. We applied the phasing method HAP to obtain the haplotype information, block partitions, and tag SNPs for all publicly available genotype data and deposited this information into the dbSNP database. We also deposited the orthologous chimpanzee reference sequence for each predicted haplotype block computed using the UCSC BLASTZ alignments of human and chimpanzee. Using dbSNP, researchers can now easily perform analyses using multiple genotype data sets from the same genomic regions. Dense and sparse genotype data sets from the same region were combined to show that the number of common haplotypes is significantly underestimated in whole genome data sets, while the predicted haplotypes over the common SNPs are consistent between studies. To validate the accuracy of the predictions, we bench-marked HAP's running time and phasing accuracy against PHASE. Although HAP is slightly less accurate than PHASE, HAP is over 1000 times faster than PHASE, making it suitable for application to the entire set of genotypes in dbSNP.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A genotype for five SNPs (left) and two possible phasings of the genotype into pairs of haplotypes (right) demonstrating the inherent ambiguity of haplotype phasing. Each SNP has possible bases of “A” and “G”. “A” and “G” positions in the genotype represent homozygous genotypes at a particular SNP, and an “H” position represents a heterozygous genotype at a particular SNP. From only the observed data, it is impossible to determine which haplotype phasing is correct.
Figure 2.
Figure 2.
A region of chromosome 6 from position 161122860–161124861 showing the comparison of the Perlegen whole-genome data set with the SeattleSNPs data set in build 123 of dbSNP containing SNPs rs783145, rs4252128, rs4252129, rs4252130, rs4252131, rs4252132, rs4252133, rs4252134, rs4252135, rs4252136, and rs4252137. The first, second and eleventh SNPs are contained in the Perlegen data and are in bold. The Perlegen haplotypes over these SNPs that occur in the population are ACG, GCG, ACA, and GTG. When SNPs contained in the SeattleSNPs data set are added to the Perlegen SNPs, many more haplotypes emerge. For example, the first Perlegen haplotype gets split into two common haplotypes and three rare haplotypes in the SeattleSNPs data set. “I” and “D” represent insertion and deletion polymorphisms in the SeattleSNPs data set.
Figure 3.
Figure 3.
A perfect phylogeny model consists of a tree where each vertex corresponds to a haplotype and each edge corresponds to a mutation in one of the positions of the haplotype. An edge is labeled with the position of the mutation. The tree fits the perfect phylogeny model if there are no recurrent mutations and no obligate recombination events. A set of haplotypes fits the perfect phylogeny model if it satisfies the four gamete test, that is, at most three allele combinations are observed for any pair of marker positions.

Similar articles

Cited by

References

    1. Cargill, M., Altshuler, D., Ireland, J., Sklar, P., Ardlie, K., Patil, N., Shaw, N., Lane, C.R., Lim, EP., Kalyanaraman, N., et al. 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22: 231–238. - PubMed
    1. Carlson, C.S., Eberle, M.A., Kruglyak, L., and Nickerson, D.A. 2004. Mapping complex disease loci in whole-genome association studies. Nature 429: 446–452. - PubMed
    1. Collins, F.S., Brooks, L.D., and Chakravarti, A. 1998. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8: 1229–1231. - PubMed
    1. Crawford, D.C., Carlson, C.S., Rieder, M.J., Carrington, D.P., Yi, Q., Smith, J.D., Eberle, M.A., Kruglyak, L., and Nickerson, D.A. 2004. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am. J. Hum. Genet. 74: 610–622. - PMC - PubMed
    1. Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S. 2001. High-resolution haplotype structure in the human genome. Nat. Genet. 29: 229–232. - PubMed

Web site references

    1. http://www.ncbi.nlm.nih.gov/projects/SNP; dbSNP
    1. http://innateimmunity.net/; Innate Immunity PGA. NHLBI program in genomic applications.

Publication types

LinkOut - more resources