Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec;61(S 02):e89-e102.
doi: 10.1055/s-0042-1757763. Epub 2022 Oct 11.

TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse

Affiliations

TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse

Miguel Pedrera-Jiménez et al. Methods Inf Med. 2022 Dec.

Abstract

Background: During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable.

Objectives: This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization.

Methods: The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML.

Results: First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined.

Conclusions: This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Fig. 1
Fig. 1
Stages of the methodology for building transparent ETL processes for EHR reuse. EHR, electronic health record; ETL, extract, transform, and load.
Fig. 2
Fig. 2
ETL configuration file implemented in XML. ETL, extract, transform, and load.
Fig. 3
Fig. 3
Example of extracted parameters and R code of ETL process. ETL, extract, transform, and load.
Fig. 4
Fig. 4
Overview of the methodology implementation process.

References

    1. Häyrinen K, Saranto K, Nykänen P. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inform. 2008;77(05):291–304. - PubMed
    1. Expert Panel . Safran C, Bloomrosen M, Hammond W E. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc. 2007;14(01):1–9. - PMC - PubMed
    1. Richesson R L, Krischer J. Data standards in clinical research: gaps, overlaps, challenges and future directions. J Am Med Inform Assoc. 2007;14(06):687–696. - PMC - PubMed
    1. Parra-Calderón C L, Sanz F, McIntosh L D.The challenge of the effective implementation of FAIR principles in biomedical research Methods Inf Med 202059(4-05):117–118. - PubMed
    1. Pedrera-Jiménez M, García-Barrio N, Cruz-Rojo J. Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models. J Biomed Inform. 2021;115:103697. - PMC - PubMed

Publication types