Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Nov;41(11):1253-7.
doi: 10.1038/ng.455. Epub 2009 Oct 4.

A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies

A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies

Kevin B Jacobs et al. Nat Genet. 2009 Nov.

Abstract

Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study. Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data. Using a likelihood-based statistical framework, we developed an improved statistic that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Our statistic compares the logarithm of genotype frequencies, in contrast to that of Homer et al., which is based on differences in either SNP probe intensity or allele frequencies. We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Histogram of Tgeno for a GWAS with 1,000 cases and controls
The figure presents data using 1,000 cases (group 1 in red), 1,000 controls (group 2 in blue) and 1,000 subjects not in the study based on genotypes from Illumina HumanHap550 assay. The theoretical null density curve is shown in black.
Figure 2
Figure 2. Histograms of calibrated Tgeno and Homer’s Tallele with 1,000 cases and controls and varying numbers of SNPs
The figure presents theoretical null density curves (black) for a GWAS with 1,000 cases (group 1 in red), 1,000 controls (group 2 in blue) and 12,000 subjects not in the study (in gray) using genotypes for (a) 10,000, (b) 100,000, and (c) 550,000 top associated SNPs from the Illumina HumanHap550 assay. Statistics were calibrated so that the null distribution was centered at zero with unit variance.
Figure 3
Figure 3. Sensitivity and specificity of Tgeno applied to GWAS data
Log-scale Receiver Operating Characteristic (ROC) curves of Tgeno with Illumina HumanHap550 data from GWAS scenarios with 1000/1000 and 5000/5000 cases and controls of European descent.

Comment in

  • Not so lost in the genetic crowd.
    Schork NJ, Bansal V. Schork NJ, et al. Nat Genet. 2009 Nov;41(11):1163-4. doi: 10.1038/ng1109-1163. Nat Genet. 2009. PMID: 19862007 No abstract available.

References

    1. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118:1590–605. - PMC - PubMed
    1. McCarthy MI, Abecasis GR, Cardon LR, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–69. - PubMed
    1. Wang WY, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet. 2005;6:109–18. - PubMed
    1. Mailman MD, Feolo M, Jin Y, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39:1181–6. - PMC - PubMed
    1. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78. - PMC - PubMed