Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality
- PMID: 26024886
- PMCID: PMC4460842
- DOI: 10.1186/s12874-015-0038-6
Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality
Abstract
Background: Record linkage of existing individual health care data is an efficient way to answer important epidemiological research questions. Reuse of individual health-related data faces several problems: Either a unique personal identifier, like social security number, is not available or non-unique person identifiable information, like names, are privacy protected and cannot be accessed. A solution to protect privacy in probabilistic record linkages is to encrypt these sensitive information. Unfortunately, encrypted hash codes of two names differ completely if the plain names differ only by a single character. Therefore, standard encryption methods cannot be applied. To overcome these challenges, we developed the Privacy Preserving Probabilistic Record Linkage (P3RL) method.
Methods: In this Privacy Preserving Probabilistic Record Linkage method we apply a three-party protocol, with two sites collecting individual data and an independent trusted linkage center as the third partner. Our method consists of three main steps: pre-processing, encryption and probabilistic record linkage. Data pre-processing and encryption are done at the sites by local personnel. To guarantee similar quality and format of variables and identical encryption procedure at each site, the linkage center generates semi-automated pre-processing and encryption templates. To retrieve information (i.e. data structure) for the creation of templates without ever accessing plain person identifiable information, we introduced a novel method of data masking. Sensitive string variables are encrypted using Bloom filters, which enables calculation of similarity coefficients. For date variables, we developed special encryption procedures to handle the most common date errors. The linkage center performs probabilistic record linkage with encrypted person identifiable information and plain non-sensitive variables.
Results: In this paper we describe step by step how to link existing health-related data using encryption methods to preserve privacy of persons in the study.
Conclusion: Privacy Preserving Probabilistic Record linkage expands record linkage facilities in settings where a unique identifier is unavailable and/or regulations restrict access to the non-unique person identifiable information needed to link existing health-related data sets. Automated pre-processing and encryption fully protect sensitive information ensuring participant confidentiality. This method is suitable not just for epidemiological research but also for any setting with similar challenges.
Figures





Similar articles
-
[Encryption technique for linkable anonymizing].Nihon Koshu Eisei Zasshi. 2004 Jun;51(6):445-51. Nihon Koshu Eisei Zasshi. 2004. PMID: 15296025 Japanese.
-
Encoding of Numerical Data for Privacy-Preserving Record Linkage.Stud Health Technol Inform. 2020 Jun 23;271:23-30. doi: 10.3233/SHTI200070. Stud Health Technol Inform. 2020. PMID: 32578537
-
Privacy preserving interactive record linkage (PPIRL).J Am Med Inform Assoc. 2014 Mar-Apr;21(2):212-20. doi: 10.1136/amiajnl-2013-002165. Epub 2013 Nov 7. J Am Med Inform Assoc. 2014. PMID: 24201028 Free PMC article. Review.
-
Some methods for blindfolded record linkage.BMC Med Inform Decis Mak. 2004 Jun 28;4:9. doi: 10.1186/1472-6947-4-9. BMC Med Inform Decis Mak. 2004. PMID: 15222890 Free PMC article.
-
An 'Honest Broker' mechanism to maintain privacy for patient care and academic medical research.Int J Med Inform. 2007 May-Jun;76(5-6):407-11. doi: 10.1016/j.ijmedinf.2006.09.004. Epub 2006 Nov 1. Int J Med Inform. 2007. PMID: 17081800 Review.
Cited by
-
Age and Cancer Incidence in 5.2 Million People With Human Immunodeficiency Virus (HIV): The South African HIV Cancer Match Study.Clin Infect Dis. 2023 Apr 17;76(8):1440-1448. doi: 10.1093/cid/ciac925. Clin Infect Dis. 2023. PMID: 36461916 Free PMC article.
-
Sensing the (digital) pulse. Future steps for improving the secondary use of data for research in Switzerland.Digit Health. 2023 Apr 20;9:20552076231169826. doi: 10.1177/20552076231169826. eCollection 2023 Jan-Dec. Digit Health. 2023. PMID: 37113255 Free PMC article.
-
Evaluation of approximate comparison methods on Bloom filters for probabilistic linkage.Int J Popul Data Sci. 2019 May 23;4(1):1095. doi: 10.23889/ijpds.v4i1.1095. Int J Popul Data Sci. 2019. PMID: 32935029 Free PMC article.
-
Cohort profile: the South African HIV Cancer Match (SAM) Study, a national population-based cohort.BMJ Open. 2022 Apr 11;12(4):e053460. doi: 10.1136/bmjopen-2021-053460. BMJ Open. 2022. PMID: 35410922 Free PMC article.
-
Record linkage without patient identifiers: Proof of concept using data from South Africa's national HIV program.PLOS Glob Public Health. 2025 Jul 9;5(7):e0004835. doi: 10.1371/journal.pgph.0004835. eCollection 2025. PLOS Glob Public Health. 2025. PMID: 40632720 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources