. 2005 Oct 27;437(7063):1299-320.

doi: 10.1038/nature04226.

A haplotype map of the human genome

International HapMap Consortium

PMID: 16255080
PMCID: PMC1880871
DOI: 10.1038/nature04226

A haplotype map of the human genome

International HapMap Consortium. Nature. 2005.

. 2005 Oct 27;437(7063):1299-320.

doi: 10.1038/nature04226.

Author

International HapMap Consortium

PMID: 16255080
PMCID: PMC1880871
DOI: 10.1038/nature04226

Abstract

Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

PubMed Disclaimer

Figures

**Figure 1. Number of SNPs in dbSNP over time**
The cumulative number of non-redundant SNPs (each mapped to a single location in the genome) is shown as a solid line, as well as the number of SNPs validated by genotyping (dotted line) and double-hit status (dashed line). Years are divided into quarters (Q1–Q4).

**Figure 2. Distribution of inter-SNP distances**
The distributions are shown for each analysis panel for the HapMappable genome (defined in the Methods), for all common SNPs (with MAF ≥ 0.05).

**Figure 3. Allele frequency and completeness of dbSNP for the ENCODE regions**
**a–c**, The fraction of SNPs in dbSNP, or with a proxy in dbSNP, are shown as a function of minor allele frequency for each analysis panel (a, YRI; b, CEU; c, CHB+JPT). Singletons refer to heterozygotes observed in a single individual, and are broken out from other SNPs with MAF < 0.05. Because all ENCODE SNPs have been deposited in dbSNP, for this figure we define a SNP as ‘in dbSNP’ if it would be in dbSNP build 125 independent of the HapMap ENCODE resequencing project. All remaining SNPs (not in dbSNP) were discovered only by ENCODE resequencing; they are categorized by their correlation (r²) to those in dbSNP. Note that the number of SNPs in each frequency bin differs among analysis panels, because not all SNPs are polymorphic in all analysis panels.

**Figure 4. Minor allele frequency distribution of SNPs in the ENCODE data, and their contribution to heterozygosity**
This figure shows the polymorphic SNPs from the HapMap ENCODE regions according to minor allele frequency (blue), with the lowest minor allele frequency bin (<0.05) separated into singletons (SNPs heterozygous in one individual only, shown in grey) and SNPs with more than one heterozygous individual. For this analysis, MAF is averaged across the analysis panels. The sum of the contribution of each MAF bin to the overall heterozygosity of the ENCODE regions is also shown (orange).

**Figure 5. Allele frequency distributions for autosomal SNPs**
For each analysis panel we plotted (bars) the MAF distribution of all the Phase I SNPs with a frequency greater than zero. The solid line shows the MAF distribution for the ENCODE SNPs, and the dashed line shows the MAF distribution expected for the standard neutral population model with constant population size and random mating without ascertainment bias.

**Figure 6. Comparison of allele frequencies in the ENCODE data for all pairs of analysis panels and between the CHB and JPT sample sets**
For each polymorphic SNP we identified the minor allele across all panels (a–d) and then calculated the frequency of this allele in each analysis panel/sample set. The colour in each bin represents the number of SNPs that display each given set of allele frequencies. The purple regions show that very few SNPs are common in one panel but rare in another. The red regions show that there are many SNPs that have similar low frequencies in each pair of analysis panels/sample sets.

**Figure 7. Genealogical relationships among haplotypes and r² values in a region without obligate recombination events**
The region of chromosome 2 (234,876,004–234,884,481 bp; NCBI build 34) within ENr131.2q37 contains 36 SNPs, with zero obligate recombination events in the CEU samples. The left part of the plot shows the seven different haplotypes observed over this region (alleles are indicated only at SNPs), with their respective counts in the data. Underneath each of these haplotypes is a binary representation of the same data, with coloured circles at SNP positions where a haplotype has the less common allele at that site. Groups of SNPs all captured by a single tag SNP (with r² ≥ 0.8) using a pairwise tagging algorithm, have the same colour. Seven tag SNPs corresponding to the seven different colours capture all the SNPs in this region. On the right these SNPs are mapped to the genealogical tree relating the seven haplotypes for the data in this region.

**Figure 8. Comparison of linkage disequilibrium and recombination for two ENCODE regions**
For each region (ENr131.2q37.1 and ENm014.7q31.33), D′ plots for the YRI, CEU and CHB+JPT analysis panels are shown: white, D′ < 1 and LOD < 2; blue, D′ = 1 and LOD < 2; pink, D′ < 1 and LOD ≥ 2; red, D′ = 1 and LOD ≥ 2. Below each of these plots is shown the intervals where distinct obligate recombination events must have occurred (blue and green indicate adjacent intervals). Stacked intervals represent regions where there are multiple recombination events in the sample history. The bottom plot shows estimated recombination rates, with hotspots shown as red triangles.

**Figure 9. The distribution of recombination events over the ENCODE regions**
Proportion of sequence containing a given fraction of all recombination for the ten ENCODE regions (coloured lines) and combined (black line). For each line, SNP intervals are placed in decreasing order of estimated recombination rate, combined across analysis panels, and the cumulative recombination fraction is plotted against the cumulative proportion of sequence. If recombination rates were constant, each line would lie exactly along the diagonal, and so lines further to the right reveal the fraction of regions where recombination is more strongly locally concentrated.

**Figure 10. The relationship among recombination rates, haplotype lengths and gene locations**
Recombination rates in cM Mb⁻¹ (blue). Non-redundant haplotypes with frequency of at least 5% in the combined sample (bars) and genes (black segments) are shown in an example gene-dense region of chromosome 19 (19q13). Haplotypes are coloured by the number of detectable recombination events they span, with red indicating many events and blue few.

**Figure 11**
The number of proxy SNPs (r² ≥ 0.8) as a function of MAF in the ENCODE data.

**Figure 12**
The number of proxies per SNP in the ENCODE data as a function of the threshold for correlation (r²).

**Figure 13**
Relationship in the Phase I HapMap between the threshold for declaring correlation between proxies and the proportion of all SNPs captured.

**Figure 14. Tag SNP information capture**
The proportion of common SNPs captured with r² ≥ 0.8 as a function of the average tag SNP spacing is shown for the phased ENCODE data, plotted (left to right) for tag SNPs prioritized by Tagger (multimarker and pairwise) and for tag SNPs picked at random. Results were averaged over all the ENCODE regions.

**Figure 15. Length of LD spans**
We fitted a simple model for the decay of linkage disequilibrium to windows of 1 million bases distributed throughout the genome. The results of model fitting are summarized for the CHB+JPT analysis panel, by plotting the fitted r² value for SNPs separated by 30 kb. The overall pattern of variation was very similar in the other analysis panels (see Supplementary Information).

**Figure 16. The distribution of the long range haplotype (LRH92) test statistic for natural selection**
In the YRI analysis panel, diversity around the *HBB* gene is highlighted by the red point. In the CEU analysis panel, diversity within the *LCT* gene region is similarly highlighted.

See this image and copyright information in PMC

Comment in

Genomics: understanding human diversity.
Goldstein DB, Cavalleri GL. Goldstein DB, et al. Nature. 2005 Oct 27;437(7063):1241-2. doi: 10.1038/4371241a. Nature. 2005. PMID: 16251937 No abstract available.

References

1. Lechler R, Warrens A. HLA in Health and Disease. 2. Academic Press; San Diego, California: 2005.
1. Strittmatter WJ, Roses AD. Apolipoprotein E and Alzheimer’s disease. Annu Rev Neurosci. 1996;19:53–77. - PubMed
1. Dahlbäck B. Resistance to activated protein C caused by the factor V R506Q mutation is a common risk factor for venous thrombosis. Thromb Haemost. 1997;78:483–488. - PubMed
1. Altshuler D, et al. The common PPARγ Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genet. 2000;26:76–80. - PubMed
1. Deeb SS, et al. A Pro12Ala substitution in PPARγ2 associated with decreased receptor activity, lower body mass index and improved insulin sensitivity. Nature Genet. 1998;20:284–287. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- Coriell Cell Repositories

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A haplotype map of the human genome

A haplotype map of the human genome

Author

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials