Composite Bloom Filters for Secure Record Linkage
- PMID: 25530689
- PMCID: PMC4269299
- DOI: 10.1109/TKDE.2013.91
Composite Bloom Filters for Secure Record Linkage
Abstract
The process of record linkage seeks to integrate instances that correspond to the same entity. Record linkage has traditionally been performed through the comparison of identifying field values (e.g., Surname), however, when databases are maintained by disparate organizations, the disclosure of such information can breach the privacy of the corresponding individuals. Various private record linkage (PRL) methods have been developed to obscure such identifiers, but they vary widely in their ability to balance competing goals of accuracy, efficiency and security. The tokenization and hashing of field values into Bloom filters (BF) enables greater linkage accuracy and efficiency than other PRL methods, but the encodings may be compromised through frequency-based cryptanalysis. Our objective is to adapt a BF encoding technique to mitigate such attacks with minimal sacrifices in accuracy and efficiency. To accomplish these goals, we introduce a statistically-informed method to generate BF encodings that integrate bits from multiple fields, the frequencies of which are provably associated with a minimum number of fields. Our method enables a user-specified tradeoff between security and accuracy. We compare our encoding method with other techniques using a public dataset of voter registration records and demonstrate that the increases in security come with only minor losses to accuracy.
Keywords: Bloom filter; data matching; entity resolution; privacy; record linkage; security.
Figures
References
-
- Elfeky M, Verykios V, Elmagarmid A. Tailor: a record linkage toolbox. Proc. 18th IEEE Int'l Conf. Data Eng. 2002:17–28.
-
- Bhattacharya I, Getoor L. Iterative record linkage for cleaning and integration. Proc. 9th ACM SIGMOD Workshop on Research Issues in Data Mining & Knowledge Discovery. 2004:11–18.
-
- Halevy A, Rajaraman A, Ordille J. Data integration: the teenage years. Proc. 32nd Int'l Conf. on Very Large Data Bases. 2006:9–16.
-
- Vatsalan D, Christen P, Verykios V. A taxonomy of privacy-preserving record linkage techniques. Information Systems. 2013 in press.
-
- Yao AC. Protocols for secure computations. IEEE Annual Symp. on Foundations of Computer Science. 1982:160–164.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous