Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003:2003:259-63.

Analysis of a probabilistic record linkage technique without human review

Affiliations

Analysis of a probabilistic record linkage technique without human review

Shaun J Grannis et al. AMIA Annu Symp Proc. 2003.

Abstract

We previously developed a deterministic record linkage algorithm demonstrating sensitivities approaching 90% while maintaining 100% specificity. Substantially better performance has been reported using probabilistic linkage techniques; however, such methods often incorporate human review into the process. To avoid human review, we employed an estimator function using the Expectation Maximization (EM) algorithm to establish a single true-link threshold. We compared the unsupervised probabilistic results against the manually reviewed gold-standard for two hospital registries, as well against our previous deterministic results. At an estimated specificity of 99.95%, actual specificities were 99.43% and 99.42% for registries A and B, respectively. At an estimated sensitivity of 99.95%, actual sensitivities were 99.19% and 98.99% for registries A and B, respectively. The EM algorithm estimated linkage parameters with acceptable accuracy, and was an improvement over the deterministic algorithm. Such a methodology may be used where record linkage is required, but human intervention is not possible or practical.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Typical two-threshold scheme for probabilistic scores using human review.
Record pairs between the upper and lower thresholds are manually reviewed for true- or false-link status.
Figure 2
Figure 2. Single probabilistic score threshold without human review.
No scores are tagged for human review.
Figure 3
Figure 3. Registry A sensitivities and specificities as a function of match likelihood score.
The EM-estimates are compared with manually reviewed (observed) values.
Figure 4
Figure 4. Registry B sensitivities and specificities as a function of match likelihood score.
The EM-estimates are compared with manually reviewed (observed) values.

References

    1. Grannis S, Overhage J, McDonald C. Analysis of Identifier Performance using a Deterministic Linkage Algorithm. In: American Medical Informatics Association; 2002 2002; San Antonio, TX; 2002. - PMC - PubMed
    1. McDonald C, Overhage J, Dexter P, Blevins L, Meeks-Johnson K. Canopy Computing: Using the Web in Clinical Practice. Journal of the American Medical Association. 1998;280(15):1325–1329. - PubMed
    1. Gill L. Methods for Automatic Record Matching and Linking and their use in National Statistics. Norwich: Her Majesty's Stationary Office; 2001.
    1. Newman TB, Brown AN. Use of commercial record linkage software and vital statistics to identify patient deaths. J Am Med Inform Assoc. 1997;4(3):233–7. - PMC - PubMed
    1. Victor TW, Mera RM. Record linkage of healthcare insurance claims. Medinfo. 2001;10(Pt 2):1409–13. - PubMed

Publication types

LinkOut - more resources