Assessing the harmonization of structured electronic health record data to reference terminologies and data completeness through data provenance
- PMID: 40247903
- PMCID: PMC12000768
- DOI: 10.1002/lrh2.10468
Assessing the harmonization of structured electronic health record data to reference terminologies and data completeness through data provenance
Abstract
Introduction: (1) Assess the harmonization of structured electronic health record data (laboratory results and medications) to reference terminologies and characterize the severity of issues. (2) Identify issues of data completeness by comparing complementary data domains, stratifying by time, care setting, and provenance.
Methods: Queries were distributed to 3 Data Partners (DP). Using harmonization queries, we examined the top 200 laboratory results and medications by volume, identifying outliers and computing summary statistics. The completeness queries looked at 4 conditions of interest and related clinical concepts. Counts were generated for each condition, stratified by year, encounter type, and provenance. We analyzed trends over time within and across DPs.
Results: We found that the median number of codes associated with a given laboratory/medication name (and vice versa) generally met expectations, though there were DP-specific issues that resulted in outliers. In addition, there were drastic differences in the percentage of patients with a given concept depending on provenance.
Conclusions: The harmonization queries surfaced several mapping errors, as well as issues with overly specific codes and records with "null" codes. The completeness queries demonstrated having access to multiple types of data provenance provides more robust results compared with any single provenance type. Harmonization errors between source data and reference terminologies may not be widespread but do exist within CDMs, affecting tens of thousands or even millions of records. Provenance information can help identify potential completeness issues with EHR data, but only if it is represented in the CDM and then populated by DPs.
Keywords: common data models; data provenance; data quality; harmonization.
© 2024 The Author(s). Learning Health Systems published by Wiley Periodicals LLC on behalf of University of Michigan. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.
Conflict of interest statement
Keith Marsolo reports grants and contracts to his institution from Novartis, Amgen, Seqirus, Genentech, BMS, Bayer, and Boehringer Ingelheim. No other authors report a conflict of interest.
Figures




Similar articles
-
How the provenance of electronic health record data matters for research: a case example using system mapping.EGEMS (Wash DC). 2014 Apr 16;2(1):1058. doi: 10.13063/2327-9214.1058. eCollection 2014. EGEMS (Wash DC). 2014. PMID: 25821838 Free PMC article.
-
An alternative database approach for management of SNOMED CT and improved patient data queries.J Biomed Inform. 2015 Oct;57:350-7. doi: 10.1016/j.jbi.2015.08.016. Epub 2015 Aug 21. J Biomed Inform. 2015. PMID: 26305513
-
A comparative analysis of the density of the SNOMED CT conceptual content for semantic harmonization.Artif Intell Med. 2015 May;64(1):29-40. doi: 10.1016/j.artmed.2015.03.002. Epub 2015 Apr 2. Artif Intell Med. 2015. PMID: 25890688 Free PMC article.
-
Adult patient access to electronic health records.Cochrane Database Syst Rev. 2021 Feb 26;2(2):CD012707. doi: 10.1002/14651858.CD012707.pub2. Cochrane Database Syst Rev. 2021. PMID: 33634854 Free PMC article.
-
FHIR Healthcare Directories: Adopting Shared Interfaces to Achieve Interoperable Medical Device Data Integration.Stud Health Technol Inform. 2018;249:181-184. Stud Health Technol Inform. 2018. PMID: 29866978 Review.
References
-
- Food and Drug Administration . Framework for FDA's Real‐World Evidence Program. 2018. Accessed May 25, 2020. https://www.fda.gov/media/120060/download
-
- Food and Drug Administration . Real‐World Data: Assessing Electronic Health Records and Medical Claims Data to Support Regulatory Decision‐Making for Drug and Biological Products. 2021. Accessed July 11, 2024. https://www.fda.gov/media/152503/download - PMC - PubMed
-
- Daniel G, Silcox C, Bryan J, McClellan M, Romine M, Frank K. Characterizing RWD Quality and Relevancy for Regulatory Purposes. 2018. Accessed September 27, 2019. https://healthpolicy.duke.edu/sites/default/files/atoms/files/characteri...
-
- Berger ML, Sox H, Willke RJ, et al. Good practices for real‐world data studies of treatment and/or comparative effectiveness: recommendations from the joint ISPOR‐ISPE special task force on real‐world evidence in health care decision making. Pharmacoepidemiol Drug Saf. 2017;26(9):1033‐1039. doi:10.1002/pds.4297 - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources
Miscellaneous