Record linkage without patient identifiers: Proof of concept using data from South Africa's national HIV program
- PMID: 40632720
- PMCID: PMC12240394
- DOI: 10.1371/journal.pgph.0004835
Record linkage without patient identifiers: Proof of concept using data from South Africa's national HIV program
Abstract
Linkage between health databases typically requires patient identifiers such as names and personal identification numbers. We developed and validated a record linkage strategy to combine administrative health databases without identifiers for South Africa's public sector HIV program. We linked CD4 counts and HIV viral loads from South Africa's TIER.Net with the National Health Laboratory Service (NHLS) database for patients receiving care between 2015-2019 in Ekurhuleni District (Gauteng Province). Linkage variables were result value, specimen collection date, facility of collection, year and month of birth, and sex. We used three matching strategies: exact matching on exact values of all variables, caliper matching allowing a ± 5 day window on result date, and specimen barcode matching using unique specimen identifiers. A sequential linkage approach applied specimen barcode, followed by exact, and then caliper matching. Exact and caliper matching were validated using barcodes (available for 34% of records in TIER.Net) as a "gold standard". Performance measures were sensitivity, positive predictive value (PPV), share of patients linked, and percent increase in data points. We attempted to link 2,017,290 laboratory test results from TIER.Net (523,558 unique patients) with 2,414,059 NHLS test results. Exact matching achieved 69.0% sensitivity and 95.1% PPV. Caliper matching achieved 75% sensitivity and 94.5% PPV. Sequential linkage matched 41.9% using specimen barcodes, 51.3% through exact matching, and 6.8% through caliper matching, for 71.9% (95% CI: 71.9, 72.0) of test results matched overall, with 96.8% (95% CI: 96.7, 97.1) PPV and 85.9% (95% CI: 85.7, 85.9) sensitivity. This linked 86.0% (95% CI: 85.9, 86.1) of TIER.Net patients to the NHLS (N = 1,450,087), increasing laboratory results in TIER.Net by 62.6%. Linkage of TIER.Net and NHLS without patient identifiers attained high accuracy and yield without compromising privacy. The integrated cohort provides a more complete laboratory test history and supports more accurate HIV program indicator estimates.
Copyright: © 2025 Shumba et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Update of
-
Record linkage without patient identifiers: proof of concept using data from South Africa's national HIV program.Res Sq [Preprint]. 2023 May 15:rs.3.rs-2893943. doi: 10.21203/rs.3.rs-2893943/v1. Res Sq. 2023. Update in: PLOS Glob Public Health. 2025 Jul 9;5(7):e0004835. doi: 10.1371/journal.pgph.0004835. PMID: 37292689 Free PMC article. Updated. Preprint.
References
-
- UNAIDS. UNAIDS 2020 Data. Jt United Nations Program HIV/AIDS. 2020.
-
- Fatti G, Meintjes G, Shea J, Eley B, Grimwood A. Improved survival and antiretroviral treatment outcomes in adults receiving community-based adherence support: 5-year results from a multicentre cohort study in South Africa. J Acquir Immune Defic Syndr. 2012;61(4):e50–8. doi: 10.1097/QAI.0b013e31826a6aee - DOI - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials