Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar:115:103697.
doi: 10.1016/j.jbi.2021.103697. Epub 2021 Feb 3.

Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models

Affiliations

Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models

Miguel Pedrera-Jiménez et al. J Biomed Inform. 2021 Mar.

Abstract

Background: COVID-19 ranks as the single largest health incident worldwide in decades. In such a scenario, electronic health records (EHRs) should provide a timely response to healthcare needs and to data uses that go beyond direct medical care and are known as secondary uses, which include biomedical research. However, it is usual for each data analysis initiative to define its own information model in line with its requirements. These specifications share clinical concepts, but differ in format and recording criteria, something that creates data entry redundancy in multiple electronic data capture systems (EDCs) with the consequent investment of effort and time by the organization.

Objective: This study sought to design and implement a flexible methodology based on detailed clinical models (DCM), which would enable EHRs generated in a tertiary hospital to be effectively reused without loss of meaning and within a short time.

Material and methods: The proposed methodology comprises four stages: (1) specification of an initial set of relevant variables for COVID-19; (2) modeling and formalization of clinical concepts using ISO 13606 standard and SNOMED CT and LOINC terminologies; (3) definition of transformation rules to generate secondary use models from standardized EHRs and development of them using R language; and (4) implementation and validation of the methodology through the generation of the International Severe Acute Respiratory and emerging Infection Consortium (ISARIC-WHO) COVID-19 case report form. This process has been implemented into a 1300-bed tertiary Hospital for a cohort of 4489 patients hospitalized from 25 February 2020 to 10 September 2020.

Results: An initial and expandable set of relevant concepts for COVID-19 was identified, modeled and formalized using ISO-13606 standard and SNOMED CT and LOINC terminologies. Similarly, an algorithm was designed and implemented with R and then applied to process EHRs in accordance with standardized concepts, transforming them into secondary use models. Lastly, these resources were applied to obtain a data extract conforming to the ISARIC-WHO COVID-19 case report form, without requiring manual data collection. The methodology allowed obtaining the observation domain of this model with a coverage of over 85% of patients in the majority of concepts.

Conclusion: This study has furnished a solution to the difficulty of rapidly and efficiently obtaining EHR-derived data for secondary use in COVID-19, capable of adapting to changes in data specifications and applicable to other organizations and other health conditions. The conclusion to be drawn from this initial validation is that this DCM-based methodology allows the effective reuse of EHRs generated in a tertiary Hospital during COVID-19 pandemic, with no additional effort or time for the organization and with a greater data scope than that yielded by conventional manual data collection process in ad-hoc EDCs.

Keywords: COVID-19; Detailed clinical models; Electronic health records; Real world data; Semantics; Standards.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Stages of the methodology for obtaining EHR-derived data.
Fig. 2
Fig. 2
Mind map of the “Oxygen saturation” (“Saturación de oxígeno” in Spanish) archetype.
Fig. 3
Fig. 3
Code in ADL of the “Oxygen saturation” (“Saturación de oxígeno” in Spanish) archetype.
Fig. 4
Fig. 4
Iterative algorithm for generation of EHR-derived data extracts.
Fig. 5
Fig. 5
Extract of semantically interoperable EHR.
Fig. 6
Fig. 6
Code in R for generating data related to “Oxygen saturation” concept.
Fig. 7
Fig. 7
Overview of the methodology implementation process.

References

    1. Zhu N., Zhang D., Wang W., et al. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 2020;382:727–733. doi: 10.1056/NEJMoa2001017. - DOI - PMC - PubMed
    1. Wu J.T., Leung K., Leung G.M. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet. 2020;395:689–697. doi: 10.1016/S0140-6736(20)30260-9. - DOI - PMC - PubMed
    1. Situation Report of WHO (COVID-19). https://www.who.int/docs/default-source/coronaviruse/weekly-updates/wou-.... Accessed December 14, 2020.
    1. Hospital Universitario 12 de Octubre. https://www.comunidad.madrid/hospital/12octubre/. Accessed December 14, 2020.
    1. Situation Report of Health Ministry of Spain (COVID-19). https://www.mscbs.gob.es/profesionales/saludPublica/ccayes/alertasActual.... Accessed December 14, 2020.