Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 7:15:1.
doi: 10.1186/s12942-015-0031-7.

Anonymisation of geographical distance matrices via Lipschitz embedding

Affiliations

Anonymisation of geographical distance matrices via Lipschitz embedding

Martin Kroll et al. Int J Health Geogr. .

Abstract

Background: Anonymisation of spatially referenced data has received increasing attention in recent years. Whereas the research focus has been on the anonymisation of point locations, the disclosure risk arising from the publishing of inter-point distances and corresponding anonymisation methods have not been studied systematically.

Methods: We propose a new anonymisation method for the release of geographical distances between records of a microdata file--for example patients in a medical database. We discuss a data release scheme in which microdata without coordinates and an additional distance matrix between the corresponding rows of the microdata set are released. In contrast to most other approaches this method preserves small distances better than larger distances. The distances are modified by a variant of Lipschitz embedding.

Results: The effects of the embedding parameters on the risk of data disclosure are evaluated by linkage experiments using simulated data. The results indicate small disclosure risks for appropriate embedding parameters.

Conclusion: The proposed method is useful if published distance information might be misused for the re-identification of records. The method can be used for publishing scientific-use-files and as an additional tool for record-linkage studies.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Illustration of step 3. The size parameter k is chosen equal to 3 and the elements of the reference sets are sampled at random from the administrative area of the United Kingdom. The coordinate fi(p) of the point p (black square) with respect to the random reference set Ri={ri1,ri2,ri3} is given by the minimum distance from p to a point of this reference set. We have d(p,ri1)=308.9, d(p,ri2)=262.3 and d(p,ri3)=162.7, thus fi(p)=min{308.9,262.3,162.7}=162.7. All distances are measured in kilometers
Fig. 2
Fig. 2
Influence of Lipschitz embedding on two different data analysis tasks. The left plot shows the rate of correct nearest neighbour classifications depending on parameters d and k of the Lipschitz embedding. The right plot shows the Spearman correlation ρ between original and approximated distances for different choices of d and k of all distances in the data files to two fixed points. The black lines refer to a fixed point in the centre of England whereas the grey lines refer to a fixed point in the north of England
Fig. 3
Fig. 3
Target (left) and identification file (right) in the first scenario (English hospital data). Each data file consists of 400 geocoded hospitals and the truncated trust code as quasi-identifier. Different colours refer to different trust codes
Fig. 4
Fig. 4
Target (left) and identification file (right) in the second scenario (simulated German population data). Each data file consists of 500 geocoded German addresses and sex and age as quasi-identifiers
Fig. 5
Fig. 5
Results of the first experiment for α=0.1 for the English hospital data
Fig. 6
Fig. 6
Results of the first experiment for α=0.5 for the English hospital data
Fig. 7
Fig. 7
Results of the first experiment for α=0.9 for the English hospital data
Fig. 8
Fig. 8
Results of the second experiment for α=0.1 for the English hospital data
Fig. 9
Fig. 9
Results of the second experiment for α=0.5 for the English hospital data
Fig. 10
Fig. 10
Results of the first experiment for α=0.1 for the German population data
Fig. 11
Fig. 11
Results of the first experiment for α=0.5 for the German population data
Fig. 12
Fig. 12
Results of the first experiment for α=0.9 for the German population data
Fig. 13
Fig. 13
Results of the second experiment for α=0.1 for the German population data
Fig. 14
Fig. 14
Results of the second experiment for α=0.5 for the German population data

Similar articles

Cited by

References

    1. Bivand RS, Pebesma EJ, Gómez-Rubio V. Applied spatial data analysis with R. New York: Springer; 2013.
    1. Trinckes JJ. The Definitive Guide to Complying with the HIPAA/HITECH Privacy and Security Rules. Boca Raton: CRC Press; 2013.
    1. Duncan G, Lambert D. The risk of disclosure for microdata. J Bus Econ Stat. 1989;7(2):207–217.
    1. El Emam K, Arbuckle L. Anonymizing health data: case studies and methods to get you started. Sebastopol: O’Reilly; 2014.
    1. Clarke KC. A multiscale masking method for point geographic data. Int J Geogr Inf Sci. 2016;30(2):300–315. doi: 10.1080/13658816.2015.1085540. - DOI

Publication types

MeSH terms

LinkOut - more resources