Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Feb 26;24(1):58.
doi: 10.1186/s12911-024-02458-7.

Conceptual design of a generic data harmonization process for OMOP common data model

Affiliations
Review

Conceptual design of a generic data harmonization process for OMOP common data model

Elisa Henke et al. BMC Med Inform Decis Mak. .

Abstract

Background: To gain insight into the real-life care of patients in the healthcare system, data from hospital information systems and insurance systems are required. Consequently, linking clinical data with claims data is necessary. To ensure their syntactic and semantic interoperability, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) from the Observational Health Data Sciences and Informatics (OHDSI) community was chosen. However, there is no detailed guide that would allow researchers to follow a generic process for data harmonization, i.e. the transformation of local source data into the standardized OMOP CDM format. Thus, the aim of this paper is to conceptualize a generic data harmonization process for OMOP CDM.

Methods: For this purpose, we conducted a literature review focusing on publications that address the harmonization of clinical or claims data in OMOP CDM. Subsequently, the process steps used and their chronological order as well as applied OHDSI tools were extracted for each included publication. The results were then compared to derive a generic sequence of the process steps.

Results: From 23 publications included, a generic data harmonization process for OMOP CDM was conceptualized, consisting of nine process steps: dataset specification, data profiling, vocabulary identification, coverage analysis of vocabularies, semantic mapping, structural mapping, extract-transform-load-process, qualitative and quantitative data quality analysis. Furthermore, we identified seven OHDSI tools which supported five of the process steps.

Conclusions: The generic data harmonization process can be used as a step-by-step guide to assist other researchers in harmonizing source data in OMOP CDM.

Keywords: Claims data; Clinical data; Data harmonization; Interoperability; OHDSI; OMOP.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
PRISMA flow diagram according to [15]
Fig. 2
Fig. 2
Frequency distribution of the extracted process steps and their assignment to the included publications
Fig. 3
Fig. 3
Percentage distribution of given numberings per process step for clinical data (group a))
Fig. 4
Fig. 4
Percentage distribution of given numberings per process step for claims data (group b))
Fig. 5
Fig. 5
Percentage distribution of given numberings per process step for clinical data and/or claims data (group c))
Fig. 6
Fig. 6
Generic data harmonization process for OMOP CDM; icons: Flaticon.com

References

    1. Semler SC, Wissing F, Heyder R. German Medical Informatics Initiative - A National Approach To Integrating Health Data from Patient Care and Medical Research. Methods Inf Med. 2018;57(Suppl 1):e50–6. - PMC - PubMed
    1. Green LA, Fryer GE, Yawn BP, Lanier D, Dovey SM. The Ecology of Medical Care Revisited. N Engl J Med. 2001;344(26):2021–5. doi: 10.1056/NEJM200106283442611. - DOI - PubMed
    1. Thun S, Dewenter H. Syntaktische und semantische Interoperabilität. In: Müller-Mielitz S, Lux T, editors. E-Health-Ökonomie [Internet]. Wiesbaden: Springer Fachmedien; 2017 [cited 2023 Mar 14]. p. 669–82. 10.1007/978-3-658-10788-8_34.
    1. Kumar G, Basri S, Imam AA, Khowaja SA, Capretz LF, Balogun AO. Data harmonization for heterogeneous datasets: a systematic literature review. Appl Sci. 2021;11(17):8275. doi: 10.3390/app11178275. - DOI
    1. Garza M, Del Fiol G, Tenenbaum J, Walden A, Zozus MN. Evaluating common data models for use with a longitudinal community registry. J Biomed Inf. 2016;64:333–41. doi: 10.1016/j.jbi.2016.10.016. - DOI - PMC - PubMed

LinkOut - more resources