Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Mar;2(3):e27.
doi: 10.1371/journal.pgen.0020027. Epub 2006 Mar 10.

An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population

Affiliations

An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population

Alexandre Montpetit et al. PLoS Genet. 2006 Mar.

Abstract

The Haplotype Map (HapMap) project recently generated genotype data for more than 1 million single-nucleotide polymorphisms (SNPs) in four population samples. The main application of the data is in the selection of tag single-nucleotide polymorphisms (tSNPs) to use in association studies. The usefulness of this selection process needs to be verified in populations outside those used for the HapMap project. In addition, it is not known how well the data represent the general population, as only 90-120 chromosomes were used for each population and since the genotyped SNPs were selected so as to have high frequencies. In this study, we analyzed more than 1,000 individuals from Estonia. The population of this northern European country has been influenced by many different waves of migrations from Europe and Russia. We genotyped 1,536 randomly selected SNPs from two 500-kbp ENCODE regions on Chromosome 2. We observed that the tSNPs selected from the CEPH (Centre d'Etude du Polymorphisme Humain) from Utah (CEU) HapMap samples (derived from US residents with northern and western European ancestry) captured most of the variation in the Estonia sample. (Between 90% and 95% of the SNPs with a minor allele frequency of more than 5% have an r2 of at least 0.8 with one of the CEU tSNPs.) Using the reverse approach, tags selected from the Estonia sample could almost equally well describe the CEU sample. Finally, we observed that the sample size, the allelic frequency, and the SNP density in the dataset used to select the tags each have important effects on the tagging performance. Overall, our study supports the use of HapMap data in other Caucasian populations, but the SNP density and the bias towards high-frequency SNPs have to be taken into account when designing association studies.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Map of Estonia
Figure 2
Figure 2. Distribution of Allelic Frequency of the Selected SNPs
(A) Distribution in the ENCODE 1 (2p16.3) region. (B) Distribution in the ENCODE 2 (2q37.1) region. The -ALL groups refer to the entire set of markers typed in the HapMap project.
Figure 3
Figure 3. LD/Block Structure of the HapMap and Estonia Samples
(A) ENCODE 1 region. (B) ENCODE 2 region.
Figure 4
Figure 4. Four-Way Comparison of Common SNPs
The Venn diagram shows the number of shared common SNPs using (A) 5% or (B) 10% as the MAF threshold. For clarity, extra circles for areas not captured in the main diagram are shown.
Figure 5
Figure 5. Performance of Tags Selected from HapMap Samples
Tags were selected from one or two HapMap samples, and the performance plotted was measured in the indicated population (A) in the ENCODE 1 region and (B) in the ENCODE 2 region. Only polymorphic SNPs with at least the specified MAF were used to select either the tags or to calculate the performance. The number of tags used for each MAF studied is indicated at the bottom of each graph.
Figure 6
Figure 6. Maximum Distribution (r 2) of SNPs from Estonia in Relation to CEU tSNPs
Tags were selected from all polymorphic SNPs of the CEU population in (A) the ENCODE 1 region (138 tSNPs) and (B) the ENCODE 2 region (171 tSNPs).
Figure 7
Figure 7. Effect of Sample Size on Tagging Performance
Random sets of 10, 30, 60, 100, 300, and 1,000 EGP samples were used to select tags at different MAF thresholds (shown as different colored lines). Tags were then tested in the CEU population, and the ratio of tagged versus all polymorphic SNPs (using an r 2 threshold of 0.8) was plotted for (A) the ENCODE 1 region and (B) the ENCODE 2 region. An average of 100 tests is shown.
Figure 8
Figure 8. Effect of SNP Density on Tag Selection
Tags were selected from the CEU samples using random sets of SNPs averaging the specified densities. The ALL set contains all SNPs and corresponds to a density of one SNP every 1.3 kbp. The Phase I pairwise and aggressive sets contain only SNPs with a minimum MAF of 5% in the CEU sample, and tags were selected with the pairwise and aggressive algorithm of Tagger, respectively. The tagging performance was calculated on the EGP cohort by measuring the ratio of tagged SNPs over all polymorphic SNPs with at least the specified MAF for (A) the ENCODE 1 region and (B) the ENCODE 2 region.

References

    1. The International HapMap Consortium. The International HapMap Project. Nature. 2003;426:789–796. - PubMed
    1. Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, et al. Haplotype tagging for the identification of common disease genes. Nat Genet. 2001;29:233–237. - PubMed
    1. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, et al. The structure of haplotype blocks in the human genome. Science. 2002;296:2225–2229. - PubMed
    1. Stram DO, Haiman CA, Hirschhorn JN, Altshuler D, Kolonel LN, et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered. 2003;55:27–36. - PubMed
    1. Ke X, Cardon LR. Efficient selective screening of haplotype tag SNPs. Bioinformatics. 2003;19:287–288. - PubMed

Publication types