Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Mar-Apr;21(2):212-20.
doi: 10.1136/amiajnl-2013-002165. Epub 2013 Nov 7.

Privacy preserving interactive record linkage (PPIRL)

Affiliations
Review

Privacy preserving interactive record linkage (PPIRL)

Hye-Chung Kum et al. J Am Med Inform Assoc. 2014 Mar-Apr.

Abstract

Objective: Record linkage to integrate uncoordinated databases is critical in biomedical research using Big Data. Balancing privacy protection against the need for high quality record linkage requires a human-machine hybrid system to safely manage uncertainty in the ever changing streams of chaotic Big Data.

Methods: In the computer science literature, private record linkage is the most published area. It investigates how to apply a known linkage function safely when linking two tables. However, in practice, the linkage function is rarely known. Thus, there are many data linkage centers whose main role is to be the trusted third party to determine the linkage function manually and link data for research via a master population list for a designated region. Recently, a more flexible computerized third-party linkage platform, Secure Decoupled Linkage (SDLink), has been proposed based on: (1) decoupling data via encryption, (2) obfuscation via chaffing (adding fake data) and universe manipulation; and (3) minimum information disclosure via recoding.

Results: We synthesize this literature to formalize a new framework for privacy preserving interactive record linkage (PPIRL) with tractable privacy and utility properties and then analyze the literature using this framework.

Conclusions: Human-based third-party linkage centers for privacy preserving record linkage are the accepted norm internationally. We find that a computer-based third-party platform that can precisely control the information disclosed at the micro level and allow frequent human interaction during the linkage process, is an effective human-machine hybrid system that significantly improves on the linkage center model both in terms of privacy and utility.

Keywords: Electronic Health Records (EHR); decoupled data; entity resolution; medical record linkage; privacy; privacy preserving interactive record linkage (PPIRL).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Systematic review process workflow.
Figure 2
Figure 2
Secure decoupled data. Internally, the data is stored in a decoupled data system (bottom), which has the same level of privacy protection as de-identified data (top right), but is much more powerful because researchers can link multiple decoupled datasets safely. Decoupled data allows for accurate record linkage with no attribute disclosure.
Figure 3
Figure 3
Chaffing and Universe Manipulation. Triangles: cancer patients; cross-hatched circles: not cancer patients. DA: Universe of all cancer patients (eg, USA); LA: list of subset of cancer patients being reviewed for linkage which is more tractable (eg, Austin); IanPII represents the PII of someone that the reviewer knows (eg, Ian who lives in Austin). Since Ian is not a unique name, it is unclear whether the PII represents the same real world Ian that the reviewer knows personally. (1) chaffing: literally changing the nature of the universe by adding fake data (eg, add blue circles to red triangles); (2) fabrication: changing the label/name of the universe presented to mislead the user on the nature of the list (eg, label DA as DB and/or LA as LB, thus IanPII now is presented as someone who lives in Beijing, China, who could not be the same Ian that the reviewer knows to live in Austin); and (3) nondisclosure: hiding the identity of the universe to reduce confidence by making the list less tractable. That is, by not disclosing the label LA, the user must assume the list represents a much larger universe DA (eg, a list from USA compared to list from Austin) The reviewer, who knows an Ian living in Austin, loses confidence in inferring the real identity of IanPII when it is presented as an Ian living in the USA compared to being presented as an Ian living in Austin.
Figure 4
Figure 4
Data recoding techniques. The SDLink GUI applies data recoding techniques which display the difference between the attributes that are meaningful for record linkage instead of the raw data. For example, the gender field only indicates, same[−], different[D], or missing[M] in one or both fields. DOB, date of birth; SSN, social security number.

Similar articles

Cited by

References

    1. Sauleau EA, Paumier J, Buemi A. Medical record linkage in health information systems by approximate string matching and clustering. BMC Med Inform Decis Mak 2005;5:32–44 - PMC - PubMed
    1. Weber SC, Lowe H, Das A, et al. A simple heuristic for blindfolded record linkage. J Am Med Inform Assoc 2012;19:157–61 - PMC - PubMed
    1. Boscoe FP, Schrag D, Chen K, et al. Building capacity to assess cancer care in the Medicaid population in New York State. Health Serv Res 2011;46:805–20 - PMC - PubMed
    1. Bronstein J, Lomatsch C, Fletcher D, et al. Issues and biases in matching medicaid pregnancy episodes to vital records data: the Arkansas experience. Mater Child Health J 2009;13:250–9 - PubMed
    1. Duvall SL, Fraser AM, Rowe K, et al. Evaluation of record linkage between a large healthcare provider and the Utah population database. J Am Med Inform Assoc 2012;19:e54–9 - PMC - PubMed