Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 19;14(2):e0212463.
doi: 10.1371/journal.pone.0212463. eCollection 2019.

Data model harmonization for the All Of Us Research Program: Transforming i2b2 data into the OMOP common data model

Affiliations

Data model harmonization for the All Of Us Research Program: Transforming i2b2 data into the OMOP common data model

Jeffrey G Klann et al. PLoS One. .

Abstract

Background: The All Of Us Research Program (AOU) is building a nationwide cohort of one million patients' EHR and genomic data. Data interoperability is paramount to the program's success. AOU is standardizing its EHR data around the Observational Medical Outcomes Partnership (OMOP) data model. OMOP is one of several standard data models presently used in national-scale initiatives. Each model is unique enough to make interoperability difficult. The i2b2 data warehousing and analytics platform is used at over 200 sites worldwide, which uses a flexible ontology-driven approach for data storage. We previously demonstrated this ontology system can drive data reconfiguration, to transform data into new formats without site-specific programming. We previously implemented this on our 12-site Accessible Research Commons for Health (ARCH) network to transform i2b2 into the Patient Centered Outcomes Research Network model.

Methods and results: Here, we leverage our investment in i2b2 high-performance transformations to support the AOU OMOP data pipeline. Because the ARCH ontology has gained widespread national interest (through the Accrual to Clinical Trials network, other PCORnet networks, and the Nebraska Lexicon), we leveraged sites' existing investments into this standard ontology. We developed an i2b2-to-OMOP transformation, driven by the ARCH-OMOP ontology and the OMOP concept mapping dictionary. We demonstrated and validated our approach in the AOU New England HPO (NEHPO). First, we transformed into OMOP a fake patient dataset in i2b2 and verified through AOU tools that the data was structurally compliant with OMOP. We then transformed a subset of data in the Partners Healthcare data warehouse into OMOP. We developed a checklist of assessments to ensure the transformed data had self-integrity (e.g., the distributions have an expected shape and required fields are populated), using OMOP's visual Achilles data quality tool. This i2b2-to-OMOP transformation is being used to send NEHPO production data to AOU. It is open-source and ready for use by other research projects.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Ontology-driven data transformation in i2b2.
The ontology, which defines concept metadata, drives the transformation from i2b2 to OMOP. Data are retrieved from the i2b2 fact table, converted to OMOP codes via ontology lookups, and then written to the OMOP tables specified through the ontology concept path.
Fig 2
Fig 2. Mapping distribution from ARCH terminologies to OMOP.
ICD and CPT codes map to six different tables in OMOP. This is just one (easily visualizable) aspect of the many complexities encountered in mapping. Boxes in the treemap are sized in a logarithmic scale.
Fig 3
Fig 3. Achilles results on our “10% of Partners’ data” dataset.
From top left to bottom right: (a) data density (notice all are in the same magnitude); (b) age at first observation (notice the expected peak in 20s followed by decrease, with a spike at age 0 representing babies born in the hospital but not receiving follow-up care); (c) population distribution by race.

References

    1. Sankar PL, Parker LS. The Precision Medicine Initiative’s All of Us Research Program: an agenda for research on its ethical, legal, and social issues. Genet Med Off J Am Coll Med Genet. 2017;19: 743–750. 10.1038/gim.2016.183 - DOI - PubMed
    1. All of Us Research Program. In: National Institutes of Health (NIH) [Internet]. [cited 6 Mar 2017]. Available: https://allofus.nih.gov/
    1. Health Care Provider Organizations | All of Us [Internet]. [cited 3 Feb 2018]. Available: https://allofus.nih.gov/about/program-components/health-care-provider-or...
    1. Verily Life Sciences: Precision Medicine Initiative [Internet]. [cited 3 Feb 2018]. Available: https://verily.com/projects/precision-medicine/precision-medicine-initia...
    1. PCORnet Common Data Model (CDM). In: PCORnet [Internet]. [cited 6 Mar 2017]. Available: http://www.pcornet.org/pcornet-common-data-model/

Publication types