Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 21;9(2):e10468.
doi: 10.1002/lrh2.10468. eCollection 2025 Apr.

Assessing the harmonization of structured electronic health record data to reference terminologies and data completeness through data provenance

Affiliations

Assessing the harmonization of structured electronic health record data to reference terminologies and data completeness through data provenance

Keith Marsolo et al. Learn Health Syst. .

Abstract

Introduction: (1) Assess the harmonization of structured electronic health record data (laboratory results and medications) to reference terminologies and characterize the severity of issues. (2) Identify issues of data completeness by comparing complementary data domains, stratifying by time, care setting, and provenance.

Methods: Queries were distributed to 3 Data Partners (DP). Using harmonization queries, we examined the top 200 laboratory results and medications by volume, identifying outliers and computing summary statistics. The completeness queries looked at 4 conditions of interest and related clinical concepts. Counts were generated for each condition, stratified by year, encounter type, and provenance. We analyzed trends over time within and across DPs.

Results: We found that the median number of codes associated with a given laboratory/medication name (and vice versa) generally met expectations, though there were DP-specific issues that resulted in outliers. In addition, there were drastic differences in the percentage of patients with a given concept depending on provenance.

Conclusions: The harmonization queries surfaced several mapping errors, as well as issues with overly specific codes and records with "null" codes. The completeness queries demonstrated having access to multiple types of data provenance provides more robust results compared with any single provenance type. Harmonization errors between source data and reference terminologies may not be widespread but do exist within CDMs, affecting tens of thousands or even millions of records. Provenance information can help identify potential completeness issues with EHR data, but only if it is represented in the CDM and then populated by DPs.

Keywords: common data models; data provenance; data quality; harmonization.

PubMed Disclaimer

Conflict of interest statement

Keith Marsolo reports grants and contracts to his institution from Novartis, Amgen, Seqirus, Genentech, BMS, Bayer, and Boehringer Ingelheim. No other authors report a conflict of interest.

Figures

FIGURE 1
FIGURE 1
Ratio of medication administrations versus orders for adalimumab (left) and donepezil (right) for data partners within PCORnet. Each dot represents the ratio for a data partner; the blue line represents perfect concordance between medication administrations and orders. For adalimumab, the ratio for all data partners is close to zero, suggesting no medication administration data associated with this medication. For donepezil, the ratio is clustered closer to the 1:1 line, though there is an extreme outlier in the top right corner. Dots far above the line suggest missing medication orders data.
FIGURE 2
FIGURE 2
Percentage of patients in a cohort with a laboratory result. Each bar represents the results from a data partner. The “empty” or 0% values on the right represent data partners who have likely not loaded the relevant results into their Common Data Model. For the data partners in the ~3%–65% range, it is unclear whether their results reflect missing data or just practice variation.
FIGURE 3
FIGURE 3
Percent ratio of medications (administrations/orders) for the chronic kidney disease (CKD) cohort by year at Data Partners 2.
FIGURE 4
FIGURE 4
Percent ratio of medications (administrations/orders) for the chronic obstructive pulmonary disease (COPD) cohort by year at Data Partners 2. Note that the Y‐axis is represented in a log scale.

Similar articles

References

    1. Califf RM. Now is the time to fix the evidence generation system. Clin Trials. 2023;20(1):3‐12. doi:10.1177/17407745221147689 - DOI - PubMed
    1. Food and Drug Administration . Framework for FDA's Real‐World Evidence Program. 2018. Accessed May 25, 2020. https://www.fda.gov/media/120060/download
    1. Food and Drug Administration . Real‐World Data: Assessing Electronic Health Records and Medical Claims Data to Support Regulatory Decision‐Making for Drug and Biological Products. 2021. Accessed July 11, 2024. https://www.fda.gov/media/152503/download - PMC - PubMed
    1. Daniel G, Silcox C, Bryan J, McClellan M, Romine M, Frank K. Characterizing RWD Quality and Relevancy for Regulatory Purposes. 2018. Accessed September 27, 2019. https://healthpolicy.duke.edu/sites/default/files/atoms/files/characteri...
    1. Berger ML, Sox H, Willke RJ, et al. Good practices for real‐world data studies of treatment and/or comparative effectiveness: recommendations from the joint ISPOR‐ISPE special task force on real‐world evidence in health care decision making. Pharmacoepidemiol Drug Saf. 2017;26(9):1033‐1039. doi:10.1002/pds.4297 - DOI - PMC - PubMed

LinkOut - more resources