Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 1;25(10):1402-1406.
doi: 10.1093/jamia/ocy071.

Using mobile location data in biomedical research while preserving privacy

Affiliations

Using mobile location data in biomedical research while preserving privacy

Daniel M Goldenholz et al. J Am Med Inform Assoc. .

Abstract

Location data are becoming easier to obtain and are now bundled with other metadata in a variety of biomedical research applications. At the same time, the level of sophistication required to protect patient privacy is also increasing. In this article, we provide guidance for institutional review boards (IRBs) to make informed decisions about privacy protections in protocols involving location data. We provide an overview of some of the major categories of technical algorithms and medical-legal tools at the disposal of investigators, as well as the shortcomings of each. Although there is no "one size fits all" approach to privacy protection, this article attempts to describe a set of practical considerations that can be used by investigators, journal editors, and IRBs.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Path of location data. Initially, location data are produced from study participants. These data are collected by the primary investigators, modified by the requirements of the institutional review board (IRB). After analysis, the data can be published, shared with secondary collaborators, and/or sent to a data repository. If a repository is used, secondary collaborators may or may not have any connection with the primary investigators. Secondary collaborators would typically have their own independent IRB that monitors their use of the data. The secondary collaborators also have the opportunity for publication. Anywhere along these multiple pathways that the data traverse, there is a possibility for re-identification of study participants if privacy is not protected.
Figure 2.
Figure 2.
Simulated examples of re-identification risk. A) Simple re-identification risk. In this example, a city has decreasing population density farther from the city center. Each x represents a home location of a de-identified subject. The subject with the gray circle, although de-identified, lives far enough from the city center that even crude location data would be sufficient to uniquely identify his/her address. B and C) Spatiotemporal behavioral re-identification risk. Bob and Alice are both wearing trackers. Bob rode a bicycle and stopped at his favorite café at some point. Alice walked without stops. In this theoretical example, the tracking data without names were posted in a public repository using true GPS coordinates, along with heart-monitoring data. Although the GPS data shown here had been de-identified, Patient A (plot B) appears to be moving at a relatively constant speed throughout the path (accounting for noise), whereas Patient B (plot C) appears to have clustered some location data around x = 100, y = 65, as if he had spent additional time there. In addition, the spacing of markers for subject 2 is farther apart (except for his one stop) than 1, suggesting that subject 2 had a faster method of travel. If a malicious hacker knew that Patients A and B were either Alice or Bob, and was aware of Bob’s owning a bicycle, he/she may be able to identify the individuals. This example shows that Alice is most likely Patient A, and Bob is most likely Patient B. After identification, the hacker can also use the heart data to connect medical diagnoses (eg atrial fibrillation) to identified participants. Note that timestamps are not shown in this figure intentionally—with them, the analysis would have been even easier.

References

    1. Mooney SJ, Westreich DJ, El-Sayed AM.. Commentary: Epidemiology in the era of big data. Epidemiology 2015; 263: 390–4. - PMC - PubMed
    1. Walker B. The Impact of Big Data on Our Everyday Lives - Infographic. VoucherCloud. 2015. https://www.vouchercloud.net/resources/big-data-infographic (Accessed January 1, 2017).
    1. Kelly K. The Inevitable. New York: Viking; 2016.
    1. Krumm J. Inference attacks on location tracks In: Pervasive Computing. Berlin, Heidelberg: Springer Berlin Heidelberg; 2007: 127–43.
    1. van Rheenen S, Watson TWJ, Alexander S, et al. An analysis of spatial clustering of stroke types, in-hospital mortality, and reported risk factors in Alberta, Canada, using geographic information Systems. Can J Neurol Sci 2015; 4205: 299–309. - PubMed

Publication types