Assessing and managing risk when sharing aggregate genetic variant data

David W Craig¹, Robert M Goor, Zhenyuan Wang, Justin Paschall, Jim Ostell, Michael Feolo, Stephen T Sherry, Teri A Manolio

Affiliations

PMID: 21921928
PMCID: PMC3349221
DOI: 10.1038/nrg3067

Review

Assessing and managing risk when sharing aggregate genetic variant data

David W Craig et al. Nat Rev Genet. 2011.

. 2011 Sep 16;12(10):730-6.

doi: 10.1038/nrg3067.

Authors

David W Craig¹, Robert M Goor, Zhenyuan Wang, Justin Paschall, Jim Ostell, Michael Feolo, Stephen T Sherry, Teri A Manolio

Affiliation

¹ Translational Genomics Research Institute (TGen), Phoenix, Arizona 85004, USA. dcraig@tgen.org

PMID: 21921928
PMCID: PMC3349221
DOI: 10.1038/nrg3067

Erratum in

Nat Rev Genet. 2011 Nov;12(11):801

Abstract

Access to genetic data across studies is an important aspect of identifying new genetic associations through genome-wide association studies (GWASs). Meta-analysis across multiple GWASs with combined cohort sizes of tens of thousands of individuals often uncovers many more genome-wide associated loci than the original individual studies; this emphasizes the importance of tools and mechanisms for data sharing. However, even sharing summary-level data, such as allele frequencies, inherently carries some degree of privacy risk to study participants. Here we discuss mechanisms and resources for sharing data from GWASs, particularly focusing on approaches for assessing and quantifying the privacy risks to participants that result from the sharing of summary-level data.

PubMed Disclaimer

Figures

**Figure 1. Sharing 5,000 SNPs at different prevalence or prior probabilities**
In the plots, we use simulations to show how the prior probability of being in a dataset impacts the ability to resolve if a person within a population using summary level allele frequencies from 5,000 SNPs on datasets of 500 individuals. In (a) we show a histogram of test-statistics based on the approach of Jacobs *et al* for resolving membership in 100,000 simulations when the person tested is actually within a dataset (red) and 100,000 simulations when the person tested is not within a dataset (blue). Since the simulations of being in a dataset and not within a dataset are equal, the prevalence or prior probability of being in the dataset is 0.5. In (b) we show 100,000 simulations when the person is not within the dataset (blue) and 100 simulations when they are within the dataset, equivalent to a prevalence or prior probability of being in the dataset of 0.001. The figures is zoomed to the right showing how a large number of tests of individuals not in the dataset can obscure the ability to distinguish true positive and false-positives. Describing risk as PPV allows one to consider prevalence for being in a dataset as a prior, thus increasing the accuracy in assessing the risk of a person within a dataset being correctly identified.

See this image and copyright information in PMC

References

1. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nature reviews. Genetics. 2005;6:95–108. - PubMed
1. Klein RJ, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. - PMC - PubMed
1. Manolio TA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed
1. Zhernakova A, et al. Meta-analysis of genome-wide association studies in celiac disease and rheumatoid arthritis identifies fourteen non-HLA shared loci. PLoS genetics. 2011;7:e1002004. - PMC - PubMed
1. Hollingworth P, et al. Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nature genetics. 2011;43:429–435. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing and managing risk when sharing aggregate genetic variant data

Affiliation

Assessing and managing risk when sharing aggregate genetic variant data

Authors

Affiliation

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources