Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
- PMID: 37384382
- PMCID: PMC10365597
- DOI: 10.2196/44331
Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
Abstract
Background: To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index (MPI) software. Record linkage in the MPI is typically performed manually by health care providers, guided by automated matching algorithms. These matching algorithms must be configured in advance, such as by setting the weights of patient attributes, usually by someone with knowledge of both the matching algorithm and the patient population being served.
Objective: We aimed to develop and evaluate a machine learning-based software tool, which automatically configures a patient matching algorithm by learning from pairs of patient records previously linked by humans already present in the database.
Methods: We built a free and open-source software tool to optimize record linkage algorithm parameters based on historical record linkages. The tool uses Bayesian optimization to identify the set of configuration parameters that lead to optimal matching performance in a given patient population, by learning from prior record linkages by humans. The tool is written assuming only the existence of a minimal HTTP application programming interface (API), and so is agnostic to the choice of MPI software, record linkage algorithm, and patient population. As a proof of concept, we integrated our tool with SantéMPI, an open-source MPI. We validated the tool using several synthetic patient populations in SantéMPI by comparing the performance of the optimized configuration in held-out data to SantéMPI's default matching configuration using sensitivity and specificity.
Results: The machine learning-optimized configurations correctly detect over 90% of true record linkages as definite matches in all data sets, with 100% specificity and positive predictive value in all data sets, whereas the baseline detects none. In the largest data set examined, the baseline matching configuration detects possible record linkages with a sensitivity of 90.2% (95% CI 88.4%-92.0%) and specificity of 100%. By comparison, the machine learning-optimized matching configuration attains a sensitivity of 100%, with a decreased specificity of 95.9% (95% CI 95.9%-96.0%). We report significant gains in sensitivity in all data sets examined, at the cost of only marginally decreased specificity. The configuration optimization tool, data, and data set generator have been made freely available.
Conclusions: Our machine learning software tool can be used to significantly improve the performance of existing record linkage algorithms, without knowledge of the algorithm being used or specific details of the patient population being served.
Keywords: Bayesian optimization; FEBRL; computerized; data linkage; electronic health records; health care system; machine learning; master index; master patient index; matching algorithm; medical record linkage; medical record systems; open-source software; pilot; quality of care; record link.
©Walter Nelson, Nityan Khanna, Mohamed Ibrahim, Justin Fyfe, Maxwell Geiger, Keith Edwards, Jeremy Petch. Originally published in JMIR Formative Research (https://formative.jmir.org), 29.06.2023.
Conflict of interest statement
Conflicts of Interest: None declared.
Similar articles
-
Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study.JMIR Public Health Surveill. 2020 Apr 30;6(2):e15917. doi: 10.2196/15917. JMIR Public Health Surveill. 2020. PMID: 32352389 Free PMC article.
-
Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort.Int J Med Inform. 2024 May;185:105387. doi: 10.1016/j.ijmedinf.2024.105387. Epub 2024 Feb 28. Int J Med Inform. 2024. PMID: 38428200
-
Issues in identification and linkage of patient records across an integrated delivery system.J Healthc Inf Manag. 1998 Fall;12(3):43-52. J Healthc Inf Manag. 1998. PMID: 10338786
-
Optimizing the Retrieval of the Vital Status of Cancer Patients for Health Data Warehouses by Using Open Government Data in France.Int J Environ Res Public Health. 2022 Apr 2;19(7):4272. doi: 10.3390/ijerph19074272. Int J Environ Res Public Health. 2022. PMID: 35409956 Free PMC article. Review.
-
Learn to optimize-a brief overview.Natl Sci Rev. 2024 Apr 2;11(8):nwae132. doi: 10.1093/nsr/nwae132. eCollection 2024 Aug. Natl Sci Rev. 2024. PMID: 39007005 Free PMC article. Review.
References
-
- Global strategy on digital health 2020-2025. World Health Organization. 2021. [2023-06-01]. https://www.who.int/docs/default-source/documents/gs4dhdaa2a9f352b0445ba... .
-
- Ross MK, Sanz J, Tep B, Follett R, Soohoo SL, Bell DS. Accuracy of an electronic health record patient linkage module evaluated between neighboring academic health care centers. Appl Clin Inform. 2020;11(5):725–732. doi: 10.1055/s-0040-1718374. http://www.thieme-connect.com/DOI/DOI?10.1055/s-0040-1718374 - DOI - PMC - PubMed
-
- Redfield C, Tlimat A, Halpern Y, Schoenfeld DW, Ullman E, Sontag DA, Nathanson LA, Horng S. Derivation and validation of a machine learning record linkage algorithm between emergency medical services and the emergency department. J Am Med Inform Assoc. 2020;27(1):147–153. doi: 10.1093/jamia/ocz176. https://europepmc.org/abstract/MED/31605488 5586507 - DOI - PMC - PubMed
-
- Ohuabunwa EC, Sun J, Jubanyik KJ, Wallis LA. Electronic medical records in low to middle income countries: the case of Khayelitsha Hospital, South Africa. Afr J Emerg Med. 2016;6(1):38–43. doi: 10.1016/j.afjem.2015.06.003. https://linkinghub.elsevier.com/retrieve/pii/S2211-419X(15)00067-1 S2211-419X(15)00067-1 - DOI - PMC - PubMed
-
- Dornan L, Pinyopornpanish K, Jiraporncharoen W, Hashmi A, Dejkriengkraikul N, Angkurawaranon C. Utilisation of electronic health records for public health in Asia: a review of success factors and potential challenges. Biomed Res Int. 2019;2019:7341841. doi: 10.1155/2019/7341841. https://www.hindawi.com/journals/bmri/2019/7341841/ - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources
Research Materials