Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Jan 22;8(1):ooaf002.
doi: 10.1093/jamiaopen/ooaf002. eCollection 2025 Feb.

Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review

Affiliations
Review

Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review

Khushi Tyagi et al. JAMIA Open. .

Abstract

Objectives: Examine the accuracy of privacy preserving record linkage (PPRL) matches in real world data (RWD).

Materials and methods: We conducted a systematic literature review to identify articles evaluating PPRL methods from January 1, 2013 to June 15, 2023. Eligible studies included original research reporting quantitative metrics such as precision and recall in health-related data sources. Covidence software was used to manage the review process.

Results: Five studies met our inclusion criteria. Tokenization and hash functions were used to hash and encrypt personally identifiable information (PII) including first and last names, dates of birth (DOB), and Social Security Numbers (SSNs) in a variety of RWD. All identified studies utilized deterministic matching. Combinations of tokenized or hashed PII that included "quasi-identifiers" like names and DOBs had consistently high precision (>95%) but lower recall, likely due to misspelled or inconsistently spelled names and name changes. SSN-based combinations demonstrated high precision but variable recall due to incomplete SSN data in RWD. Studies that employed algorithms in which at least one match was identified from a specified set of PII combinations provided high precision and high recall.

Discussion: The systematic review indicates that PPRL methods generally provide highly accurate patient data linkage while maintaining privacy.

Conclusions: Researchers should carefully consider the completeness and stability of each PII element selected for PPRL and may want to employ a strategy that allows for patient records to be matched if they meet at least one of several combinations of PII.

Keywords: administrative claims; data anonymization; electronic health records; healthcare; personally identifiable information; privacy preserving record linkage.

PubMed Disclaimer

Conflict of interest statement

S.J.W. and K.T. are employees of Pfizer, Inc. and may hold stock in the company.

Figures

Figure 1.
Figure 1.
PRISMA diagram.

Similar articles

References

    1. Jayaratne M, Nallaperuma D, De Silva D, et al. A data integration platform for patient-centered e-healthcare and clinical decision support. Future Gener Comput Syst. 2019;92:996-1008.
    1. Batko K, Ślęzak A. The use of big data analytics in healthcare. J Big Data. 2022;9:3. - PMC - PubMed
    1. Noroozi M, Zahedi L, Bathaei FS, et al. Challenges of confidentiality in clinical settings: compilation of an ethical guideline. Iran J Public Health. 2018;47:875-883. - PMC - PubMed
    1. Sankar P, Mora S, Merz JF, Jones NL. Patient perspectives of medical confidentiality: a review of the literature. J Gen Intern Med. 2003;18:659-669. - PMC - PubMed
    1. Iott BE, Campos-Castillo C, Anthony DL. Trust and privacy: how patient trust in providers is related to privacy behaviors and attitudes. AMIA Annu Symp Proc. 2020;2019:487-493. - PMC - PubMed

LinkOut - more resources