Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 28:7:1423621.
doi: 10.3389/fdgth.2025.1423621. eCollection 2025.

Harmonizing population health data into OMOP common data model: a demonstration using COVID-19 sero-surveillance data from Nairobi Urban Health and Demographic Surveillance System

Affiliations

Harmonizing population health data into OMOP common data model: a demonstration using COVID-19 sero-surveillance data from Nairobi Urban Health and Demographic Surveillance System

Michael Ochola et al. Front Digit Health. .

Abstract

Background: Observational health data are collected in different formats and structures, making it challenging to analyze with common tools. The Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM) is a standardized data model that can harmonize observational health data.

Objective: This paper demonstrates the use of the OMOP CDM to harmonize COVID-19 sero-surveillance data from the Nairobi Urban Health and Demographic Surveillance System (HDSS).

Methods: In this study, we extracted data from the Nairobi Urban HDSS COVID-19 sero-surveillance database and mapped it to the OMOP CDM. We used open-source Observational Health Data Sciences and Informatics (OHDSI) tools like WhiteRabbit, RabbitInAHat, and USAGI. The steps included data profiling (scanning), mapping the vocabularies using the offline USAGI and online ATHENA, and designing the extract, transform, and load (ETL) process using RabbitInAHat. The ETL process was implemented using Pentaho Data Integration community edition software and structured query language (SQL). The target OMOP CDM can now be used to analyze the prevalence of COVID-19 antibodies in the Nairobi Urban HDSS population.

Results: We successfully mapped the Nairobi Urban HDSS COVID-19 sero-surveillance data to the OMOP CDM. The standardized dataset included information on demographics, COVID-19 symptoms, vaccination, and COVID-19 antibody test results.

Conclusions: The OMOP CDM is a valuable tool for harmonizing observational health data. Using the OMOP CDM facilitates the sharing and analysis of observational health data, leading to a better understanding of disease conditions and trends and improving evidence-based population health strategies.

Keywords: Africa; COVID-19; ETL; Nairobi Urban HDSS; OMOP CDM; observational health data; population health data; sero-surveillance.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Harmonization pipeline architecture for NUHDSS COVID-19 sero-survey.
Figure 2
Figure 2
High-level overview of NUHDSS COVID-19 sero-survey data mapped to OMOP CDM tables.
Figure 3
Figure 3
Mapping from source data to OMOP CDM person table.
Figure 4
Figure 4
ETL specification document extract from person table with some variables.
Figure 5
Figure 5
Pentaho ETL pipeline with various transformations and jobs moving data to OMOP CDM v6.0.
Figure 6
Figure 6
ATLAS dashboard report for harmonized COVID-19 sero-survey.
Figure 7
Figure 7
ATLAS condition occurrence report for harmonized COVID-19 sero-survey.
Figure 8
Figure 8
ATLAS data density plot on condition occurrence for harmonized COVID-19 sero-survey.
Figure 9
Figure 9
Proportion of conditions across age groups.

References

    1. Reimer AP, Milinovich A, Madigan EA. Data quality assessment framework to assess electronic medical record data for use in research. Int J Med Inform. (2016) 90:40–7. 10.1016/j.ijmedinf.2016.03.006 - DOI - PMC - PubMed
    1. Cacace M, Ettelt S, Mays N, Nolte E. Assessing quality in cross-country comparisons of health systems and policies: towards a set of generic quality criteria. Health Policy. (2013) 112(1–2):156–62. 10.1016/j.healthpol.2013.03.020 - DOI - PubMed
    1. Hripcsak G, Shang N, Peissig PL, Rasmussen LV, Liu C, Benoit B, et al. Facilitating phenotype transfer using a common data model. J Biomed Inform. (2019) 96:2–3. 10.1016/j.jbi.2019.103253 - DOI - PMC - PubMed
    1. OHDSI. Observational Health Data Sciences and Informatics (OHDSI). Available online at: https://ohdsi.org/ (accessed September 14, 2023)
    1. Lynch KE, Deppen S, Duvall S, Viernes B, Cao A, Park D, et al. Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. Appl Clin Inform. (2017) 17(5):1–12. 10.1186/s12911-017-0532-3 - DOI - PMC - PubMed

LinkOut - more resources