Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jun 20:2024.06.19.24309149.
doi: 10.1101/2024.06.19.24309149.

Augmenting maternal clinical cohort data with administrative laboratory dataset linkages: a validation study

Affiliations

Augmenting maternal clinical cohort data with administrative laboratory dataset linkages: a validation study

Laura Rossouw et al. medRxiv. .

Update in

Abstract

Background: The use of big data and large language models in healthcare can play a key role in improving patient treatment and healthcare management, especially when applied to large-scale administrative data. A major challenge to achieving this is ensuring that patient confidentiality and personal information is protected. One way to overcome this is by augmenting clinical data with administrative laboratory dataset linkages in order to avoid the use of demographic information.

Methods: We explored an alternative method to examine patient files from a large administrative dataset in South Africa (the National Health Laboratory Services, or NHLS), by linking external data to the NHLS database using specimen barcodes associated with laboratory tests. This offers us with a deterministic way of performing data linkages without accessing demographic information. In this paper, we quantify the performance metrics of this approach.

Results: The linkage of the large NHLS data to external hospital data using specimen barcodes achieved a 95% success. Out of the 1200 records in the validation sample, 87% were exact matches and 9% were matches with typographic correction. The remaining 5% were either complete mismatches or were due to duplicates in the administrative data.

Conclusions: The high success rate indicates the reliability of using barcodes for linking data without demographic identifiers. Specimen barcodes are an effective tool for deterministic linking in health data, and may provide a method of creating large, linked data sets without compromising patient confidentiality.

Keywords: Big Data; Data Linkage; HIV; Patient Confidentiality; Validation.

PubMed Disclaimer

References

    1. Wang L, Alexander CA. Big data in medical applications and health care. Am Med J. 2015;6(1):1.
    1. European Union. Regulation (EU) 2016/679 (General Data Protection Regulation). Accessed May 30, 2024. https://gdpr-info.eu/
    1. South African Parliament. Protection of Personal Information Act (POPI Act). POPIA. Accessed May 30, 2024. https://popia.co.za/
    1. Human Sciences Research Council. The Sixth South African National HIV Prevalence, Incidene, Behaviour and Communication Survey. Human Sciences Research Council; 2022. https://sahivsoc.org/Files/SABSSMVI-SUMMARY-SHEET-2023.pdf
    1. Kufa-Chakezha T, Shangase N, Lombard C, Manda S, Puren A. The 2022 Antenatal HIV Sentinel Survey: Key Findings. National Department of Health; 2022. https://www.nicd.ac.za/wp-content/uploads/2024/01/Antenatal-survey-2022-...

Publication types

LinkOut - more resources