Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jun 27;4(6):e1000109.
doi: 10.1371/journal.pgen.1000109.

Calibrating the performance of SNP arrays for whole-genome association studies

Affiliations
Comparative Study

Calibrating the performance of SNP arrays for whole-genome association studies

Ke Hao et al. PLoS Genet. .

Abstract

To facilitate whole-genome association studies (WGAS), several high-density SNP genotyping arrays have been developed. Genetic coverage and statistical power are the primary benchmark metrics in evaluating the performance of SNP arrays. Ideally, such evaluations would be done on a SNP set and a cohort of individuals that are both independently sampled from the original SNPs and individuals used in developing the arrays. Without utilization of an independent test set, previous estimates of genetic coverage and statistical power may be subject to an overfitting bias. Additionally, the SNP arrays' statistical power in WGAS has not been systematically assessed on real traits. One robust setting for doing so is to evaluate statistical power on thousands of traits measured from a single set of individuals. In this study, 359 newly sampled Americans of European descent were genotyped using both Affymetrix 500K (Affx500K) and Illumina 650Y (Ilmn650K) SNP arrays. From these data, we were able to obtain estimates of genetic coverage, which are robust to overfitting, by constructing an independent test set from among these genotypes and individuals. Furthermore, we collected liver tissue RNA from the participants and profiled these samples on a comprehensive gene expression microarray. The RNA levels were used as a large-scale set of quantitative traits to calibrate the relative statistical power of the commercial arrays. Our genetic coverage estimates are lower than previous reports, providing evidence that previous estimates may be inflated due to overfitting. The Ilmn650K platform showed reasonable power (50% or greater) to detect SNPs associated with quantitative traits when the signal-to-noise ratio (SNR) is greater than or equal to 0.5 and the causal SNP's minor allele frequency (MAF) is greater than or equal to 20% (N = 359). In testing each of the more than 40,000 gene expression traits for association to each of the SNPs on the Ilmn650K and Affx500K arrays, we found that the Ilmn650K yielded 15% times more discoveries than the Affx500K at the same false discovery rate (FDR) level.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Plot of genetic coverage of Affx HapMap SNPs and Affx NonHapMap SNPs were calculated among HapMap CEU subjects and liver study subjects, respectively.
The effect of SNP overfitting and sample overfitting can be seen.
Figure 2
Figure 2. On the simulated trait values, the statistical power and NTD (number of true discoveries) were estimated for the Affymetrix 500K and Illumina tag SNP arrays.
Figure 3
Figure 3. Weighted estimates for statistical power by taking the weighted average of the power on HapMap causal SNPs (weight = 2.2/7.5) and that on NonHapMap causal SNPs (weight = 5.3/7.5).
(A) Kruskal-Wallis tests. (B) Spearman rank correlation tests. Further, we quantified the power of “direct genotyping,” where association tests were conducted on causal SNPs. This represents an upper bound on statistical power in WGAS.
Figure 4
Figure 4. Tests of association on liver gene expression traits.
(A) Number of gene expression traits that were associated with SNPs on Affymetrix and Illumina microarrays at fixed FDR levels. (B) We restricted the association tests to SNPs within 1 Mb range of the gene and present the number of cis-associating gene expression traits at a given FDR level.
Figure 5
Figure 5. The number of significant cis-associations among the liver gene expression traits at FDR = % at varying sample sizes.
The relative number of significant associations is an empirical estimate of the relative levels of power of the different platforms at different sample sizes.

References

    1. Kruglyak L, Nickerson DA. Variation is the spice of life. Nat Genet. 2001;27(3):234–236. - PubMed
    1. Carlson CS, Eberle MA, Rieder MJ, Smith JD, Kruglyak L, et al. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat Genet. 2003;33(4):518–521. - PubMed
    1. Gonzalez-Neira A, Ke X, Lao O, Calafell F, Navarro A, et al. The portability of tagSNPs across populations: a worldwide survey. Genome Res. 2006;16(3):323–330. - PMC - PubMed
    1. Barrett JC, Cardon LR. Evaluating coverage of genome-wide association studies. Nat Genet. 2006;38(6):659–662. - PubMed
    1. Pe'er I, de Bakker PI, Maller J, Yelensky R, Altshuler D, et al. Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat Genet. 2006;38(6):663–667. - PubMed

Publication types