Improving Cohort-Hospital Matching Accuracy through Standardization and Validation of Participant Identifiable Information
- PMID: 36553359
- PMCID: PMC9776599
- DOI: 10.3390/children9121916
Improving Cohort-Hospital Matching Accuracy through Standardization and Validation of Participant Identifiable Information
Abstract
Linking very large, consented birth cohorts to birthing hospitals clinical data could elucidate the lifecourse outcomes of health care and exposures during the pregnancy, birth and newborn periods. Unfortunately, cohort personally identifiable information (PII) often does not include unique identifier numbers, presenting matching challenges. To develop optimized cohort matching to birthing hospital clinical records, this pilot drew on a one-year (December 2020-December 2021) cohort for a single Australian birthing hospital participating in the whole-of-state Generation Victoria (GenV) study. For 1819 consented mother-baby pairs and 58 additional babies (whose mothers were not themselves participating), we tested the accuracy and effort of various approaches to matching. We selected demographic variables drawn from names, DOB, sex, telephone, address (and birth order for multiple births). After variable standardization and validation, accuracy rose from 10% to 99% using a deterministic-rule-based approach in 10 steps. Using cohort-specific modifications of the Australian Statistical Linkage Key (SLK-581), it took only 3 steps to reach 97% (SLK-5881) and 98% (SLK-5881.1) accuracy. We conclude that our SLK-5881 process could safely and efficiently achieve high accuracy at the population level for future birth cohort-birth hospital matching in the absence of unique identifier numbers.
Keywords: birth cohort; data accuracy; data linkage; demographics; hospital; hospital records; information retrieval; newborn; personally identifiable information; pregnant women.
Conflict of interest statement
The authors declare no conflict of interest.
Figures




Similar articles
-
Generation Victoria (GenV): protocol for a longitudinal birth cohort of Victorian children and their parents.BMC Public Health. 2025 Jan 3;25(1):20. doi: 10.1186/s12889-024-21108-1. BMC Public Health. 2025. PMID: 39754130 Free PMC article.
-
Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study.BMC Health Serv Res. 2010 Feb 18;10:41. doi: 10.1186/1472-6963-10-41. BMC Health Serv Res. 2010. PMID: 20167118 Free PMC article.
-
Linkage of Australian national registry data using a statistical linkage key.BMC Med Inform Decis Mak. 2021 Feb 2;21(1):37. doi: 10.1186/s12911-021-01393-1. BMC Med Inform Decis Mak. 2021. PMID: 33531002 Free PMC article.
-
Integrating trials into a whole-population cohort of children and parents: statement of intent (trials) for the Generation Victoria (GenV) cohort.BMC Med Res Methodol. 2020 Sep 24;20(1):238. doi: 10.1186/s12874-020-01111-x. BMC Med Res Methodol. 2020. PMID: 32972373 Free PMC article. Review.
-
Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review.JAMIA Open. 2025 Jan 22;8(1):ooaf002. doi: 10.1093/jamiaopen/ooaf002. eCollection 2025 Feb. JAMIA Open. 2025. PMID: 39845287 Free PMC article. Review.
Cited by
-
An oversampling-undersampling strategy for large-scale data linkage.Front Big Data. 2025 Apr 23;8:1542483. doi: 10.3389/fdata.2025.1542483. eCollection 2025. Front Big Data. 2025. PMID: 40336553 Free PMC article.
-
Generation Victoria (GenV): protocol for a longitudinal birth cohort of Victorian children and their parents.BMC Public Health. 2025 Jan 3;25(1):20. doi: 10.1186/s12889-024-21108-1. BMC Public Health. 2025. PMID: 39754130 Free PMC article.
-
Study protocol: Generation Victoria (GenV) special care nursery registry.Int J Popul Data Sci. 2023 Jun 13;8(1):2139. doi: 10.23889/ijpds.v8i1.2139. eCollection 2023. Int J Popul Data Sci. 2023. PMID: 37670960 Free PMC article.
References
-
- Colombo F., Oderkirk J., Slawomirski L. Handbook of Global Health. Springer; Berlin/Heidelberg, Germany: 2020. Health information systems, electronic medical records, and big data in global healthcare: Progress and challenges in oecd countries; pp. 1–31. Chapter 71-1.
LinkOut - more resources
Full Text Sources