Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 20;18(10):e0291581.
doi: 10.1371/journal.pone.0291581. eCollection 2023.

An open-source probabilistic record linkage process for records with family-level information: Simulation study and applied analysis

Affiliations

An open-source probabilistic record linkage process for records with family-level information: Simulation study and applied analysis

John Prindle et al. PLoS One. .

Abstract

Research with administrative records involves the challenge of limited information in any single data source to answer policy-related questions. Record linkage provides researchers with a tool to supplement administrative datasets with other information about the same people when identified in separate sources as matched pairs. Several solutions are available for undertaking record linkage, producing linkage keys for merging data sources for positively matched pairs of records. In the current manuscript, we demonstrate a new application of the Python RecordLinkage package to family-based record linkages with machine learning algorithms for probability scoring, which we call probabilistic record linkage for families (PRLF). First, a simulation of administrative records identifies PRLF accuracy with variations in match and data degradation percentages. Accuracy is largely influenced by degradation (e.g., missing data fields, mismatched values) compared to the percentage of simulated matches. Second, an application of data linkage is presented to compare regression model estimate performance across three record linkage solutions (PRLF, ChoiceMaker, and Link Plus). Our findings indicate that all three solutions, when optimized, provide similar results for researchers. Strengths of our process, such as the use of ensemble methods, to improve match accuracy are discussed. We then identify caveats of record linkage in the context of administrative data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Comparison of PRLF with varied match thresholds to Link Plus and ChoiceMaker model estimates with 95% confidence intervals.
Estimated parameters for each predictor is shown with a solid dot and confidence intervals are represented by error bars. Blue identifies Link Plus parameter estimates and confidence intervals and red identifies ChoiceMaker estimates and confidence intervals. Black dots and grey confidence intervals show the estimates for the XGBoost algorithm of PRLF when the threshold for a pair identified as a match varied (0.40–0.95) from left to right. Values above exp(b) = 1 indicate an increase in likelihood of child welfare referral, whereas values below exp(b) = 1 indicate a decrease in the likelihood of child welfare referral.

References

    1. Brownell MD, Jutte DP. Administrative data linkage as a tool for child maltreatment research. Child Abuse Negl. 2013;37:120–124. doi: 10.1016/j.chiabu.2012.09.013 - DOI - PubMed
    1. Campbell KM, Deck D, Krupski A. Record linkage software in the public domain: a comparison of LinkPlus, The Link King, and abasic’deterministic algorithm. Health Informatics J. 2008;14:5–15. doi: 10.1177/1460458208088855 - DOI - PubMed
    1. Enamorado T, Fifield B, Imai K. Using a probabilistic model to assist merging of large-scale administrative records. Am Polit Sci Rev. 2019;113:353–371.
    1. Tromp M, Ravelli AC, Bonsel GJ, Hasman A, Reitsma JB. Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. J Clin Epidemiol. 2011;64:565–572. doi: 10.1016/j.jclinepi.2010.05.008 - DOI - PubMed
    1. Grannis SJ, Overhage JM, Hui S, McDonald CJ. Analysis of a probabilistic record linkage technique without human review. In: AMIA annual symposium proceedings. Bethesda: American Medical Informatics Association; 2003. p. 259. - PMC - PubMed