Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Jul;52(7):646-654.
doi: 10.1038/s41588-020-0651-0. Epub 2020 Jun 29.

Privacy challenges and research opportunities for genomic data sharing

Affiliations
Review

Privacy challenges and research opportunities for genomic data sharing

Luca Bonomi et al. Nat Genet. 2020 Jul.

Abstract

The sharing of genomic data holds great promise in advancing precision medicine and providing personalized treatments and other types of interventions. However, these opportunities come with privacy concerns, and data misuse could potentially lead to privacy infringement for individuals and their blood relatives. With the rapid growth and increased availability of genomic datasets, understanding the current genome privacy landscape and identifying the challenges in developing effective privacy-protecting solutions are imperative. In this work, we provide an overview of major privacy threats identified by the research community and examine the privacy challenges in the context of emerging direct-to-consumer genetic-testing applications. We additionally present general privacy-protection techniques for genomic data sharing and their potential applications in direct-to-consumer genomic testing and forensic analyses. Finally, we discuss limitations in current privacy-protection methods, highlight possible mitigation strategies and suggest future research opportunities for advancing genomic data sharing.

PubMed Disclaimer

Conflict of interest statement

Competing Interests

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Taxonomy of known privacy attacks in genomic data sharing. We differentiate between two main categories of privacy attacks: identification and phenotype inference. For each type of attack, we highlight the main known techniques and report relevant published examples.
Figure 2
Figure 2
Membership disclosure attack by Homer et al., where an adversary aims at determining the presence of the target in the mixture (e.g., case group). (1) Data Acquisition: the attacker has partial genomic data of a known target individual (i.e., SNPs) and he has access to publicly available summary of statistics (e.g., GWAS). (2) SNP Frequency Estimation: the attacker estimates the allele frequency for each j-SNP in the target data (Yi,j), in the mixture (Mj), and in the reference population (Popj). (3) Profile Comparison: a SNP-wise distance measure (D(Yi,j)) is computed to determine how the profile of the target deviates from the reference population and mixture. Notice that D(Yi,j) is positive when Yi,j is closer to Mj and negative when Yi,j is closer to Popj. Furthermore, for a sufficiently large sample, the distance D(Yi,j) follows a normal distribution. (4) Hypothesis Testing: a one-sampled t-test is performed by the attacker to determine the likelihood of the target belonging to the mixture, where E[∙] and SD[∙] denote the expectation and standard deviation, respectively, and s denotes the number of SNPs. (5) Test Outcome: a positive test indicates that the target belongs to the mixture. As a result, the attacker may learn that the target individual has the phenotype that defines a “case”.
Figure 3
Figure 3
Genetic Genealogy Search framework for forensics analysis. (1) Genomic data are collected from the crime scene, and a genomic-wide profile of the subject is constructed. (2) A search of matching profiles is conducted on publicly available datasets. The genomic information may lead to the identification of a match representing a relative (e.g., cousin). The genealogical information is used to narrow down the family tree, for individuals who may be suspects (e.g., living in the vicinity of the crime scene). (3) When a suitable suspect is identified, a direct DNA test is performed to confirm the match with the DNA collected from the crime scene.

References

    1. Mardis ER A decade’s perspective on DNA sequencing technology. Nature 470, 198 (2011). - PubMed
    1. Metzker ML Sequencing technologies - the next generation. Nat. Rev. Genet 11, 31–46 (2010). - PubMed
    1. Investigators, A. of U. R. P. The “All of Us” Research Program. N. Engl. J. Med 381, 668–676 (2019). - PMC - PubMed
    1. Green RC et al. Disclosure of APOE genotype for risk of Alzheimer’s disease. N. Engl. J. Med 361, 245–254 (2009). - PMC - PubMed
    1. Goldman JS et al. Genetic counseling and testing for Alzheimer disease: joint practice guidelines of the American College of Medical Genetics and the National Society of Genetic Counselors. Genet. Med 13, 597 (2011). - PMC - PubMed

Publication types