Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 25:9:41.
doi: 10.1186/1472-6947-9-41.

Privacy-preserving record linkage using Bloom filters

Affiliations

Privacy-preserving record linkage using Bloom filters

Rainer Schnell et al. BMC Med Inform Decis Mak. .

Abstract

Background: Combining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns.

Methods: A new protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers has been developed. The protocol is based on Bloom filters on q-grams of identifiers.

Results: Tests on simulated and actual databases yield linkage results comparable to non-encrypted identifiers and superior to results from phonetic encodings.

Conclusion: We proposed a protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers. Since the protocol can be easily enhanced and has a low computational burden, the protocol might be useful for many applications requiring privacy-preserving record linkage.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of the use of two Bloom filters for the privacy-preserving computation of string similarities.
Figure 2
Figure 2
Comparison of precision and recall for Bloom filters with unencrypted trigrams using simulated data.
Figure 3
Figure 3
Comparison of precision and recall for Bloom filters with exact string comparison using simulated data.
Figure 4
Figure 4
Comparison of precision and recall for Bloom filters with Soundex using simulated data.
Figure 5
Figure 5
Comparison of precision and recall for Bloom filters with unencrypted bigrams using actual data.
Figure 6
Figure 6
Comparison of precision and recall for Bloom filters with a phonetic encoding using actual data.
Figure 7
Figure 7
Rescaled cutout of figure 6 highlighting recall levels above .75.

References

    1. Herzog TN, Scheuren FJ, Winkler WE. Data quality and record linkage techniques. New York: Springer; 2007.
    1. Clifton C, Kantarcioglu M, Doan A, Schadow G, Vaidya J, Elmagarmid AK, Suciu D. In: Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery: 13 June 2004; Paris. Das G, Liu B, Yu PS, editor. New York: ACM; 2004. Privacy-preserving data integration and sharing; pp. 19–26.
    1. Churches T, Christen P. In: Advances in knowledge discovery and data mining. Proceedings of the 8th Pacific-Asia Conference: 26–28 May 2004; Sydney. Dai H, Srikant R, Zhang C, editor. Berlin: Springer; 2004. Blind data linkage using n-gram similarity comparisons; pp. 121–126.
    1. Al-Lawati A, Lee D, McDaniel P. In: Proceedings of the 2nd International Workshop on Information Quality in Information Systems: 17 June 2005; Baltimore. Berti-Equille L, Batini C, Srivastava D, editor. New York: ACM; 2005. Blocking-aware private record linkage; pp. 59–68.
    1. Agrawal R, Evfimievski A, Srikant R. In: Proceedings of the ACM SIGMOD International Conference on Management of Data: 9–12 June 2003; San Diego. Halevy AY, Ives ZG, Doan A, editor. New York: ACM; 2003. Information sharing across private databases; pp. 86–97.

Publication types

MeSH terms

LinkOut - more resources