Analysis of a probabilistic record linkage technique without human review
- PMID: 14728174
- PMCID: PMC1479910
Analysis of a probabilistic record linkage technique without human review
Abstract
We previously developed a deterministic record linkage algorithm demonstrating sensitivities approaching 90% while maintaining 100% specificity. Substantially better performance has been reported using probabilistic linkage techniques; however, such methods often incorporate human review into the process. To avoid human review, we employed an estimator function using the Expectation Maximization (EM) algorithm to establish a single true-link threshold. We compared the unsupervised probabilistic results against the manually reviewed gold-standard for two hospital registries, as well against our previous deterministic results. At an estimated specificity of 99.95%, actual specificities were 99.43% and 99.42% for registries A and B, respectively. At an estimated sensitivity of 99.95%, actual sensitivities were 99.19% and 98.99% for registries A and B, respectively. The EM algorithm estimated linkage parameters with acceptable accuracy, and was an improvement over the deterministic algorithm. Such a methodology may be used where record linkage is required, but human intervention is not possible or practical.
Figures




References
-
- McDonald C, Overhage J, Dexter P, Blevins L, Meeks-Johnson K. Canopy Computing: Using the Web in Clinical Practice. Journal of the American Medical Association. 1998;280(15):1325–1329. - PubMed
-
- Gill L. Methods for Automatic Record Matching and Linking and their use in National Statistics. Norwich: Her Majesty's Stationary Office; 2001.
-
- Victor TW, Mera RM. Record linkage of healthcare insurance claims. Medinfo. 2001;10(Pt 2):1409–13. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources