Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Mar-Apr;13(2):160-5.
doi: 10.1197/jamia.M1920. Epub 2005 Dec 15.

A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection

Affiliations

A context-sensitive approach to anonymizing spatial surveillance data: impact on outbreak detection

Christopher A Cassa et al. J Am Med Inform Assoc. 2006 Mar-Apr.

Abstract

Objective: The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic.

Design: Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density.

Measurements: A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured.

Results: The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4% and lowers the detection specificity less than 1%.

Conclusion: A population-density-based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Experiment description: five weeks of Children's Hospital Boston visit data are each individually combined with 252 different artificially generated spatial clusters. Each of the resulting 1,260 data sets was then anonymized at ten different levels for a total of 12,600 experimental data sets.
Figure 2.
Figure 2.
Estimating expected k-anonymity. Using the data set standard deviation of the distance each patient is moved in the anonymization, σ, an estimate of achieved k-anonymity is calculated, assuming no other external knowledge of specific patient information. The local population density (people/km2) is multiplied by each area (km2) and then multiplied by the probability that the patient would have been in that area, from the Gaussian probability distribution function.
Figure 3.
Figure 3.
Distribution of distance from original location. Each case was moved from an original home address to a new de-identified location. Each data series represents the percentage of patients who were displaced plotted against distance (km) displaced from original location. Average distances moved: 0.0587, 0.1168, 0.1762, and 0.2354 km.
Figure 4.
Figure 4.
Average k-anonymity achieved versus average distance moved. As the average distance (km) moved in a given data set increases, the anonymity achieved also increases in a quadratic fashion.
Figure 5.
Figure 5.
Average cluster detection sensitivity/specificity versus average distance to original point (average distance increases as anonymization level increases). The average sensitivity and specificity of spatial detection (using SaTScan Bernoulli Spatial Model with p-value ≤0.05) of artificially injected clusters of patients is displayed with respect to the average distance that patients in a de-identified data set are moved with respect to their original home addresses. Sensitivity and specificity are calculated using cases from the cluster and control data that were or were not identified properly.
Figure 6.
Figure 6.
Percentage of visits that meet specific k-anonymity thresholds. For different user-specified k-anonymity minimum thresholds, the percentage of visits in a data set with a k-anonymity value below the minimum threshold (and not sufficiently de-identified) decreases quickly as the average distance moved increases. For over 99% of the visits in all test data sets, a minimum k-anonymity value of 20 could be achieved with an average distance moved of 0.25 km.

References

    1. Olson KL, Bonetti M, Pagano M, Mandl KD. Real time spatial cluster detection using interpoint distances among precise patient locations. BMC Med Inform Decis Making. 2005;5:19. - PMC - PubMed
    1. Buckeridge DL, Burkom H, Campbell M, Hogan WR, Moore AW, Project B. Algorithms for rapid outbreak detection: a research synthesis. J Biomed Inform. 2005;38:99–113. - PubMed
    1. Kulldorff M, Heffernan R, Hartman J, Assuncao R, Mostashari F. A space-time permutation scan statistic for disease outbreak detection. PloS Med. 2005;2:216–24. - PMC - PubMed
    1. Sweeney L. Guaranteeing anonymity when sharing medical data, the Datafly System. Proc AMIA Annu Fall Symp. 1997:51–5. - PMC - PubMed
    1. Sweeney L. k-Anonymity: A model for protecting privacy. Int J Uncertainty Fuzziness Knowledge-Based Syst. 2002;10:557–70.

Publication types