Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Feb;35(2):111-8.
doi: 10.1002/gepi.20556. Epub 2010 Dec 31.

Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS

Affiliations

Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS

Gang Shi et al. Genet Epidemiol. 2011 Feb.

Abstract

We propose a two-stage approach to analyze genome-wide association data in order to identify a set of promising single-nucleotide polymorphisms (SNPs). In stage one, we select a list of top signals from single SNP analyses by controlling false discovery rate. In stage two, we use the least absolute shrinkage and selection operator (LASSO) regression to reduce false positives. The proposed approach was evaluated using simulated quantitative traits based on genome-wide SNP data on 8,861 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC) Study. Our first stage, targeted at controlling false negatives, yields better power than using Bonferroni-corrected significance level. The LASSO regression reduces the number of significant SNPs in stage two: it reduces false-positive SNPs and it reduces true-positive SNPs also at simulated causal loci due to linkage disequilibrium. Interestingly, the LASSO regression preserves the power from stage one, i.e., the number of causal loci detected from the LASSO regression in stage two is almost the same as in stage one, while reducing false positives further. Real data on systolic blood pressure in the ARIC study was analyzed using our two-stage approach which identified two significant SNPs, one of which was reported to be genome-significant in a meta-analysis containing a much larger sample size. On the other hand, a single SNP association scan did not yield any significant results.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Empirical FDR for controlling type I error rates at 10−4, 10−5, 10−6, and 8.2×10−8 (Bonferroni corrected).
Fig. 2
Fig. 2
Empirical type I error rate in stage 1 and stage 2.
Fig. 3
Fig. 3
Number of “causal” loci significant in stage 1 and stage 2.

References

    1. Adeyemo A, Gerry N, Chen G, Herbert A, Doumatey A, Huang H, Zhou J, Lashley K, Chen Y, Christman M, Rotimi C. A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet. 2009;5:e1000564. - PMC - PubMed
    1. Amos CI, Chen WV, Seldin MF, Remmers EF, Taylor KE, Criswell LA, Lee AT, Plenge RM, Kastner DL, Gregersen PK. Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proc. 2009;3(Suppl 7):S2. - PMC - PubMed
    1. Asano K, Matsushita T, Umeno J, Hosono N, Takahashi A, Kawaguchi T, Matsumoto T, Matsui T, Kakuta Y, Kinouchi Y, Shimosegawa T, Hosokawa M, Arimura Y, Shinomura Y, Kiyohara Y, Tsunoda T, Kamatani N, Iida M, Nakamura Y, Kubo M. A genome-wide association study identifies three new susceptibility loci for ulcerative colitis in the Japanese population. Nat Genet. 2009;41:1325–1329. - PubMed
    1. Ball KD, Erman B, Dill KA. The elastic net algorithm and protein structure prediction. J Comput Chem. 2002;23:77–83. - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Stat Soc Ser B. 1995;57:289–300.

Publication types

LinkOut - more resources