Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 30:15:46.
doi: 10.1186/s12874-015-0038-6.

Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality

Affiliations

Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality

Kurt Schmidlin et al. BMC Med Res Methodol. .

Abstract

Background: Record linkage of existing individual health care data is an efficient way to answer important epidemiological research questions. Reuse of individual health-related data faces several problems: Either a unique personal identifier, like social security number, is not available or non-unique person identifiable information, like names, are privacy protected and cannot be accessed. A solution to protect privacy in probabilistic record linkages is to encrypt these sensitive information. Unfortunately, encrypted hash codes of two names differ completely if the plain names differ only by a single character. Therefore, standard encryption methods cannot be applied. To overcome these challenges, we developed the Privacy Preserving Probabilistic Record Linkage (P3RL) method.

Methods: In this Privacy Preserving Probabilistic Record Linkage method we apply a three-party protocol, with two sites collecting individual data and an independent trusted linkage center as the third partner. Our method consists of three main steps: pre-processing, encryption and probabilistic record linkage. Data pre-processing and encryption are done at the sites by local personnel. To guarantee similar quality and format of variables and identical encryption procedure at each site, the linkage center generates semi-automated pre-processing and encryption templates. To retrieve information (i.e. data structure) for the creation of templates without ever accessing plain person identifiable information, we introduced a novel method of data masking. Sensitive string variables are encrypted using Bloom filters, which enables calculation of similarity coefficients. For date variables, we developed special encryption procedures to handle the most common date errors. The linkage center performs probabilistic record linkage with encrypted person identifiable information and plain non-sensitive variables.

Results: In this paper we describe step by step how to link existing health-related data using encryption methods to preserve privacy of persons in the study.

Conclusion: Privacy Preserving Probabilistic Record linkage expands record linkage facilities in settings where a unique identifier is unavailable and/or regulations restrict access to the non-unique person identifiable information needed to link existing health-related data sets. Automated pre-processing and encryption fully protect sensitive information ensuring participant confidentiality. This method is suitable not just for epidemiological research but also for any setting with similar challenges.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Basic steps of Privacy Preserving Probabilistic Record Linkage (P3RL)
Fig. 2
Fig. 2
Flowchart of Privacy Preserving Probabilistic Record Linkage (P3RL) methods
Fig. 3
Fig. 3
Example of Privacy Preserving Probabilistic Record Linkage (P3RL) masking and shuffling procedures
Fig. 4
Fig. 4
Example of Privacy Preserving Probabilistic Record Linkage (P3RL) pre-processing data cleaning rules
Fig. 5
Fig. 5
Example of Bloom filter encryption for surname (bigrams, two hash-functions, Bloom filter length 28 bits)

Similar articles

Cited by

References

    1. Stark C, MacLeod M, Hall D, O'Brien F, Pelosi A. Mortality after discharge from long-term psychiatric care in Scotland, 1977–94: a retrospective cohort study. BMC Public Health. 2003;3(1):30. doi: 10.1186/1471-2458-3-30. - DOI - PMC - PubMed
    1. Alavi M, Law MG, Grebely J, Thein HH, Walter S, Amin J, et al. Lower life expectancy among people with an HCV notification: a population-based linkage study. J Viral Hepat. 2014;21(6):e10–e18. doi: 10.1111/jvh.12245. - DOI - PubMed
    1. Wartenberg D, Thompson WD. Privacy versus public health: the impact of current confidentiality rules. Am J Public Health. 2010;100(3):407–412. doi: 10.2105/AJPH.2009.166249. - DOI - PMC - PubMed
    1. Detmer DE. Your privacy or your health–will medical privacy legislation stop quality health care? Int J Qual Health Care. 2000;12(1):1–3. doi: 10.1093/intqhc/12.1.1. - DOI - PubMed
    1. Jaro MA. Probalistic linkage of large public health data files. Stat Med. 1995;14:491–498. doi: 10.1002/sim.4780140510. - DOI - PubMed

Publication types

MeSH terms