Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2021 Aug:136:136-145.
doi: 10.1016/j.jclinepi.2021.04.015. Epub 2021 Apr 28.

Probabilistic linkage without personal information successfully linked national clinical datasets

Affiliations
Comparative Study

Probabilistic linkage without personal information successfully linked national clinical datasets

Helen A Blake et al. J Clin Epidemiol. 2021 Aug.

Abstract

Background: Probabilistic linkage can link patients from different clinical databases without the need for personal information. If accurate linkage can be achieved, it would accelerate the use of linked datasets to address important clinical and public health questions.

Objective: We developed a step-by-step process for probabilistic linkage of national clinical and administrative datasets without personal information, and validated it against deterministic linkage using patient identifiers.

Study design and setting: We used electronic health records from the National Bowel Cancer Audit and Hospital Episode Statistics databases for 10,566 bowel cancer patients undergoing emergency surgery in the English National Health Service.

Results: Probabilistic linkage linked 81.4% of National Bowel Cancer Audit records to Hospital Episode Statistics, vs. 82.8% using deterministic linkage. No systematic differences were seen between patients that were and were not linked, and regression models for mortality and length of hospital stay according to patient and tumour characteristics were not sensitive to the linkage approach.

Conclusion: Probabilistic linkage was successful in linking national clinical and administrative datasets for patients undergoing a major surgical procedure. It allows analysts outside highly secure data environments to undertake linkage while minimizing costs and delays, protecting data security, and maintaining linkage quality.

Keywords: Electronic health records; National clinical datasets; Patient identifiers; Personal information; Probabilistic linkage; Record linkage.

PubMed Disclaimer

Figures

Fig 1
Fig. 1
Comparing data quality and chance agreement using overall m-probabilities and u-probabilities of candidate linkage variables.
Fig 2
Fig. 2
Distribution of match weights computed in blocking step 1 after deterministically linking on Cancer Alliance (threshold for linkage, T, of 25 chosen at the point where two distributions intersect). Frequency plotted using logarithmic scale.
Fig 3
Fig. 3
Receiver Operating Characteristic curve evaluating the sensitivity and specificity of probabilistic linkage match weights compared to assumed gold-standard of deterministic linkage (with probabilistic linkage threshold T=25 marked).
Fig 4
Fig. 4
Crude estimates and 95% confidence intervals (CI) for 90-day mortality odds ratios (OR), 2-year mortality hazard ratios (HR), and crude mean difference in length of stay (LOS) using patients linked deterministically or patients linked probabilistically. Mortality outcomes plotted using logarithmic scale.

References

    1. Bohensky M.A., Jolley D., Sundararajan V., Evans S., Pilcher D.V., Scott I. Data linkage: A powerful research tool with potential problems. BMC Health Serv Res. 2010;10(1):346. - PMC - PubMed
    1. Gilbert R., Lafferty R., Hagger-Johnson G., Harron K., Zhang L.-C., Smith P. GUILD: GUidance for Information about Linking Data sets. J Public Health (Oxf) 2017;40:191–198. 03. - PMC - PubMed
    1. Harron K., Dibben C., Boyd J., Hjern A., Azimaee M., Barreto M.L. Challenges in administrative data linkage for research. Big Data Soc. 2017;4(2) 2053951717745678. - PMC - PubMed
    1. Harron K., Gilbert R., Cromwell D., van der Meulen J. Linking data for mothers and babies in de-identified electronic health data. PLoS One. 2016;11(10) - PMC - PubMed
    1. Lawson E.H., Ko C.Y., Louie R., Han L., Rapp M., Zingmond D.S. Linkage of a clinical surgical registry with Medicare inpatient claims data using indirect identifiers. Surgery. 2013;153(3):423–430. - PubMed

Publication types

MeSH terms

LinkOut - more resources