Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jul;83(1):112-9.
doi: 10.1016/j.ajhg.2008.06.008. Epub 2008 Jun 26.

Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms

Affiliations

Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms

Carl A Anderson et al. Am J Hum Genet. 2008 Jul.

Abstract

Genotype imputation is potentially a zero-cost method for bridging gaps in coverage and power between genotyping platforms. Here, we quantify these gains in power and coverage by using 1,376 population controls that are from the 1958 British Birth Cohort and were genotyped by the Wellcome Trust Case-Control Consortium with the Illumina HumanHap 550 and Affymetrix SNP Array 5.0 platforms. Approximately 50% of genotypes at single-nucleotide polymorphisms (SNPs) exclusively on the HumanHap 550 can be accurately imputed from direct genotypes on the SNP Array 5.0 or Illumina HumanHap 300. This roughly halves differences in coverage and power between the platforms. When the relative cost of currently available genome-wide SNP platforms is accounted for, and finances are limited but sample size is not, the highest-powered strategy in European populations is to genotype a larger number of individuals with the HumanHap 300 platform and carry out imputation. Platforms consisting of around 1 million SNPs offer poor cost efficiency for SNP association in European populations.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Minor-Allele Frequency of SNPs Directly Genotyped in 1,376 Samples from the 58C (A) Minor-allele frequency for the 450,769 SNPs that are featured on the HumanHap 550 but not the Affymetrix SNP Array 5.0 and are also polymorphic in the 58C. (B) Minor-allele frequency for the subset of 427,839 SNPs from (A) that are also polymorphic in the CEU HapMap data. (C) Minor-allele frequency for the 215,998 that are featured on the HumanHap 550 but not the Illumina HumanHap 300 and are also polymorphic in the 58C. (D) Minor-allele frequency for the subset of 203,860 SNPs from (C) that are also polymorphic in the CEU HapMap data. Basing imputations on haplotype data from the HapMap causes variation at rare SNPs (MAF ≤ 0.02) to be lost.
Figure 2
Figure 2
Assessment of Imputed-Genotype Filtering Criteria Assessment of filtering criteria for the Illumina HumanHap 550 genotypes based on Affymetrix SNP Array 5.0 (A–C) and Illumina HumanHap300 (D–F) genotype data. (A and D) The number of SNPs passing filter thresholds based on per-SNP measures of mean maximum posterior probability (blue) or genotype call rate (red). The number of these SNPs with an r2 ≥ 0.8 between direct and imputed genotype calls is shown after the removal of SNPs not passing filtering thresholds based on per-SNP measures of mean maximum posterior probability (dark gray) and genotype call rate (light gray). (B and E) The PLS-DA Q2 value after the removal of SNPs not passing filtering thresholds based on per-SNP measures of mean maximum posterior probability (blue) and genotype call rate (red). A Q2 value of 1 indicates that the current PLS model can perfectly predict whether a given genotype vector is of direct or imputed origin. A Q2 of 0 indicates that the model has no power to predict the genotype's origin. (C and F) Mean r2 between direct and imputed genotypes after the removal of SNPs not passing filtering thresholds based on per-SNP measures of mean maximum posterior probability (blue) and genotype call rate (red).
Figure 3
Figure 3
Mean power to Detect Association to a Disease with a Fixed Baseline Sample Size Mean power to detect association (α = 10−5) to a disease with a population prevalence of 0.0001 and a fixed baseline sample size across different genome-wide platforms (simulated under varying risk allele frequency [RAF] and sample size). RAF ranges are as follows: (A–C) 0.05 ≤ RAF < 0.10; (D–F) 0.10 ≤ RAF < 0.20; (G and H) 0.20 ≤ RAF ≤ 0.50. Cases and controls are as follows: (A, D, and G) 1,000 cases, 1,000 controls; (B, E, and F) 2,000 cases, 2,000 controls; (C, F, and I) 5,000 cases, 5,000 controls. Mean power was calculated after 10,000 simulations where sample size per simulation for each SNP set was weighted by the maximum r2 between a randomly selected HapMap SNP (satisfying RAF constraints) and the SNPs on the given genotyping platform (with HapMap release 21 CEU data).
Figure 4
Figure 4
Mean Power to Detect Association to a Disease Where Baseline Sample Size Has Been Varied across Genome-wide SNP Platforms to Reflect Relative Cost Mean power to detect association (α = 10−5) to a disease with a population prevalence of 0.0001 where baseline sample size has been varied across genome-wide SNP platforms to reflect the genotyping cost per sample (sample-size ratios: SNP Array 5.0 = 1.22; HumanHap 300 = 1.32; HumanHap 550 = 1; SNP Array 6.0 = 0.99; HumanHap 1M = 0.57). RAF ranges are as follows: (A–C) 0.05 ≤ RAF < 0.10; (D–F) 0.10 ≤ RAF < 0.20; (G and H) 0.20 ≤ RAF ≤ 0.50. Cases and controls are as follows: (A, D, and G) 1,000 cases, 1,000 controls; (B, E, and F) 2,000 cases, 2,000 controls; (C, F, and I) 5000 cases, 5,000 controls. Mean power was calculated after 10,000 simulations where sample size per simulation for each SNP set was weighted by the maximum r2 between a randomly selected HapMap SNP (satisfying RAF constraints) and the SNPs on the given genotyping platform (with HapMap release 21 CEU data).

References

    1. Smyth D.J., Cooper J.D., Bailey R., Field S., Burren O., Smink L.J., Guja C., Ionescu-Tirgoviste C., Widmer B., Dunger D.B., et al. A genome-wide association study of nonsynonymous SNPs identifies a type 1 diabetes locus in the interferon-induced helicase (IFIH1) region. Nat. Genet. 2006;38:617–619. - PubMed
    1. McPherson R., Pertsemlidis A., Kavaslar N., Stewart A., Roberts R., Cox D.R., Hinds D.A., Pennacchio L.A., Tybjaerg-Hansen A., Folsom A.R., et al. A common allele on chromosome 9 associated with coronary heart disease. Science. 2007;316:1488–1491. - PMC - PubMed
    1. Rioux J.D., Xavier R.J., Taylor K.D., Silverberg M.S., Goyette P., Huett A., Green T., Kuballa P., Barmada M.M., Datta L.W., et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat. Genet. 2007;39:596–604. - PMC - PubMed
    1. The Wellcome Trust Case Control Consortium Genome-wide association study of 14,000 cases of seven common disease and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed
    1. Barrett J.C., Cardon L.R. Evaluating coverage of genome-wide association studies. Nat. Genet. 2006;38:659–662. - PubMed

Publication types

LinkOut - more resources