Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review
- PMID: 39845287
- PMCID: PMC11752849
- DOI: 10.1093/jamiaopen/ooaf002
Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review
Abstract
Objectives: Examine the accuracy of privacy preserving record linkage (PPRL) matches in real world data (RWD).
Materials and methods: We conducted a systematic literature review to identify articles evaluating PPRL methods from January 1, 2013 to June 15, 2023. Eligible studies included original research reporting quantitative metrics such as precision and recall in health-related data sources. Covidence software was used to manage the review process.
Results: Five studies met our inclusion criteria. Tokenization and hash functions were used to hash and encrypt personally identifiable information (PII) including first and last names, dates of birth (DOB), and Social Security Numbers (SSNs) in a variety of RWD. All identified studies utilized deterministic matching. Combinations of tokenized or hashed PII that included "quasi-identifiers" like names and DOBs had consistently high precision (>95%) but lower recall, likely due to misspelled or inconsistently spelled names and name changes. SSN-based combinations demonstrated high precision but variable recall due to incomplete SSN data in RWD. Studies that employed algorithms in which at least one match was identified from a specified set of PII combinations provided high precision and high recall.
Discussion: The systematic review indicates that PPRL methods generally provide highly accurate patient data linkage while maintaining privacy.
Conclusions: Researchers should carefully consider the completeness and stability of each PII element selected for PPRL and may want to employ a strategy that allows for patient records to be matched if they meet at least one of several combinations of PII.
Keywords: administrative claims; data anonymization; electronic health records; healthcare; personally identifiable information; privacy preserving record linkage.
© The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Conflict of interest statement
S.J.W. and K.T. are employees of Pfizer, Inc. and may hold stock in the company.
Similar articles
-
Privacy preserving record linkage for public health action: opportunities and challenges.J Am Med Inform Assoc. 2024 Nov 1;31(11):2605-2612. doi: 10.1093/jamia/ocae196. J Am Med Inform Assoc. 2024. PMID: 39047294 Free PMC article.
-
Optimization of the Mainzelliste software for fast privacy-preserving record linkage.J Transl Med. 2021 Jan 15;19(1):33. doi: 10.1186/s12967-020-02678-1. J Transl Med. 2021. PMID: 33451317 Free PMC article.
-
A methodological assessment of privacy preserving record linkage using survey and administrative data.Stat J IAOS. 2022 Jun 7;38(2):413-421. doi: 10.3233/sji-210891. Stat J IAOS. 2022. PMID: 35910693 Free PMC article.
-
Developing Methods to Link Patient Records across Data Sets That Preserve Patient Privacy [Internet].Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2020 Jun. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2020 Jun. PMID: 37535796 Free Books & Documents. Review.
-
Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study.JMIR Med Inform. 2021 Oct 15;9(10):e29871. doi: 10.2196/29871. JMIR Med Inform. 2021. PMID: 34652278 Free PMC article. Review.
References
-
- Jayaratne M, Nallaperuma D, De Silva D, et al. A data integration platform for patient-centered e-healthcare and clinical decision support. Future Gener Comput Syst. 2019;92:996-1008.
Publication types
LinkOut - more resources
Full Text Sources
Miscellaneous