Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep;48(1):6.
doi: 10.1145/2767007.

Privacy in the Genomic Era

Affiliations

Privacy in the Genomic Era

Muhammad Naveed et al. ACM Comput Surv. 2015 Sep.

Abstract

Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.

Keywords: Additional Key Words and Phrases: genomics privacy; biomedical research; healthcare; recreational genomics; security.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Properties of DNA that, in combination, may distinguish it from other data types
Health/Behavior means that DNA contains information about an individuals health and behavior. Static(Traceable) means that DNA does not change much over time in an individual. Unique means that the DNA of any two individuals can be easily distinguished from one another. Mystique refers to the public perception of mystery about DNA. Value refers to the importance of information content in DNA and that this importance does not decline with time (which is the case with other medical data e.g., blood pressure, glucose level, or a blood test). In fact, this importance will likely increase with time. Kinship means that DNA contains information about an individual blood relatives.
Fig. 2
Fig. 2
Probes of attitudes
Fig. 3
Fig. 3
Self-identified expertise of the survey respondents
Fig. 4
Fig. 4
Response to the question: Do you believe that: (Multiple options can be checked). The probes are described in detail in Figure 2. “None” means the respondent does not agree with any of the probes.
Fig. 5
Fig. 5
Response to the question: Would you publicly share your genome on the Web?
Fig. 6
Fig. 6
Response to the question: Assuming that one’s genomic data leaks lot of private information about his or her relatives, do you think one should have the right to share his or her genomic data?
Fig. 7
Fig. 7
Response to the question: What can we compromise to improve privacy of genomic data? (Multiple options can be checked)
Fig. 8
Fig. 8
Relevance of genome privacy research done by computer science community.
Fig. 9
Fig. 9
Genomic data handling framework: DNA is extracted from an individual’s tissue or cells. DNA is digitized either using sequencing (to obtain Whole Genome Sequence (WGS) or Whole Exome Sequence (WES)) or genotyping (to obtain variants (usually only SNPs)). Reads obtained from sequencing are aligned to form the complete genome, while genotyped variants are digitized from the microchip array directly. Read data may be stored for later analysis. The aligned genome can be either stored in raw form or compressed form (variations from a reference human genome). Medical tests and other types of computation shown in the figure can be performed either on raw aligned genome or just on variants. Possible outputs of computation are shown. Output depends on the type of computation and in some cases there is no output. The figure shows the genomic data aggregation problem caused by recreational genomics services. The figure is divided into three sections based on fundamental limitations of legal and technical measures for the protection of genomic data. Legal protection is required for the left section, legal as well as technical protection is required for the middle section, while, in theory, technical solutions would suffice for the protection of the right section. The legend shows which blocks are associated with different uses of genomic data. We use the word “patient” in this paper to mean someone whose genome is sequenced or genotyped and not necessarily a sick person.

References

    1. Agrawal Rakesh, Kiernan Jerry, Srikant Ramakrishnan, Xu Yirong. Order preserving encryption for numeric data. ACM SIGMOD international conference on Management of data; 2004. pp. 563–574.
    1. Allen Naomi E, Sudlow Cathie, Peakman Tim, Collins Rory, Dal-Ré Rafael, Ioannidis John P, Bracken Michael B, Buffler Patricia A, Chan An-Wen, Franco Eduardo L, et al. UK biobank data: come and get it. Science translational medicine. 2014;6:224. - PubMed
    1. Altman Russ B, Clayton Ellen Wright, Kohane Isaac S, Malin Bradley A, Roden Dan M. Data re-identification: societal safeguards. Science. 2013;339(6123):1032. - PMC - PubMed
    1. Altman Russ B, Klein Teri E. CHALLENGES FOR BIOMEDICAL INFORMATICS AND PHARMACOGENOMICS. Annual Review of Pharmacology and Toxicology. 2002;42(1):113–133. - PubMed
    1. Anderlik Mary R. Assessing the Quality of DNA-based Parentage Testing: Findings from a Survey of Laboratories. Jurimetrics. 2003:291–314.