Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

Alexander Maletzky¹, Carl Böck², Thomas Tschoellitsch³, Theresa Roland⁴, Helga Ludwig⁴, Stefan Thumfart¹, Michael Giretzlehner¹, Sepp Hochreiter⁴, Jens Meier³

Affiliations

¹ Research Department Medical Informatics, RISC Software GmbH, Hagenberg, Austria.
² JKU LIT SAL eSPML Lab, Institute of Signal Processing, Johannes Kepler University, Linz, Austria.
³ Department of Anesthesiology and Critical Care Medicine, Kepler University Hospital GmbH, Johannes Kepler University, Linz, Austria.
⁴ ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria.

PMID: 36269654
PMCID: PMC9636533
DOI: 10.2196/38557

Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

Alexander Maletzky et al. JMIR Med Inform. 2022.

. 2022 Oct 21;10(10):e38557.

doi: 10.2196/38557.

Authors

Alexander Maletzky¹, Carl Böck², Thomas Tschoellitsch³, Theresa Roland⁴, Helga Ludwig⁴, Stefan Thumfart¹, Michael Giretzlehner¹, Sepp Hochreiter⁴, Jens Meier³

Affiliations

¹ Research Department Medical Informatics, RISC Software GmbH, Hagenberg, Austria.
² JKU LIT SAL eSPML Lab, Institute of Signal Processing, Johannes Kepler University, Linz, Austria.
³ Department of Anesthesiology and Critical Care Medicine, Kepler University Hospital GmbH, Johannes Kepler University, Linz, Austria.
⁴ ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria.

PMID: 36269654
PMCID: PMC9636533
DOI: 10.2196/38557

Abstract

Electronic health records (EHRs) have been successfully used in data science and machine learning projects. However, most of these data are collected for clinical use rather than for retrospective analysis. This means that researchers typically face many different issues when attempting to access and prepare the data for secondary use. We aimed to investigate how raw EHRs can be accessed and prepared in retrospective data science projects in a disciplined, effective, and efficient way. We report our experience and findings from a large-scale data science project analyzing routinely acquired retrospective data from the Kepler University Hospital in Linz, Austria. The project involved data collection from more than 150,000 patients over a period of 10 years. It included diverse data modalities, such as static demographic data, irregularly acquired laboratory test results, regularly sampled vital signs, and high-frequency physiological waveform signals. Raw medical data can be corrupted in many unexpected ways that demand thorough manual inspection and highly individualized data cleaning solutions. We present a general data preparation workflow, which was shaped in the course of our project and consists of the following 7 steps: obtain a rough overview of the available EHR data, define clinically meaningful labels for supervised learning, extract relevant data from the hospital's data warehouses, match data extracted from different sources, deidentify them, detect errors and inconsistencies therein through a careful exploratory analysis, and implement a suitable data processing pipeline in actual code. Only few of the data preparation issues encountered in our project were addressed by generic medical data preprocessing tools that have been proposed recently. Instead, highly individualized solutions for the specific data used in one's own research seem inevitable. We believe that the proposed workflow can serve as a guidance for practitioners, helping them to identify and address potential problems early and avoid some common pitfalls.

Keywords: electronic health record; machine learning; medical data preparation; retrospective data analysis.

©Alexander Maletzky, Carl Böck, Thomas Tschoellitsch, Theresa Roland, Helga Ludwig, Stefan Thumfart, Michael Giretzlehner, Sepp Hochreiter, Jens Meier. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 21.10.2022.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Primary challenges with retrospective medical data analysis (adapted from Johnson et al [18], which is published under Creative Commons Attribution 4.0 International License CC-BY 4.0 [19]).

**Figure 2**
Data sources and exported modalities in use cases 1 to 5. HIS, PDMS, and Bedmaster are data management systems deployed in the hospital, whereas information about extramural mortality and blood products had to be obtained from external sources. HIS: hospital information system; PDMS: patient data management system; ICU: intensive care unit.

**Figure 3**
Short periods of constant low values in waveform signals might have to be cut out. Left: original signal with a 0.5-second period of constant low values. Right: signal after cutting out the low value; as can be seen, the 2 ends of the signal fit perfectly.

**Figure 4**
Data preparation workflow for retrospective EHR data analysis. EHR: electronic health record.

See this image and copyright information in PMC

References

1. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M, Sundberg P, Yee H, Zhang K, Zhang Y, Flores G, Duggan GE, Irvine J, Le Q, Litsch K, Mossin A, Tansuwan J, Wang D, Wexler J, Wilson J, Ludwig D, Volchenboum SL, Chou K, Pearson M, Madabushi S, Shah NH, Butte AJ, Howell MD, Cui C, Corrado GS, Dean J. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. doi: 10.1038/s41746-018-0029-1.29 - DOI - DOI - PMC - PubMed
1. Purushotham S, Meng C, Che Z, Liu Y. Benchmarking deep learning models on large healthcare datasets. J Biomed Inform. 2018 Jul;83:112–34. doi: 10.1016/j.jbi.2018.04.007. https://linkinghub.elsevier.com/retrieve/pii/S1532-0464(18)30071-6 S1532-0464(18)30071-6 - DOI - PubMed
1. Harutyunyan H, Khachatrian H, Kale DC, Ver Steeg G, Galstyan A. Multitask learning and benchmarking with clinical time series data. Sci Data. 2019 Jun 17;6(1):96. doi: 10.1038/s41597-019-0103-9. doi: 10.1038/s41597-019-0103-9.10.1038/s41597-019-0103-9 - DOI - DOI - PMC - PubMed
1. Caicedo-Torres W, Gutierrez J. ISeeU: visually interpretable deep learning for mortality prediction inside the ICU. J Biomed Inform. 2019 Oct;98:103269. doi: 10.1016/j.jbi.2019.103269. https://linkinghub.elsevier.com/retrieve/pii/S1532-0464(19)30188-1 S1532-0464(19)30188-1 - DOI - PubMed
1. Hatib F, Jian Z, Buddi S, Lee C, Settels J, Sibert K, Rinehart J, Cannesson M. Machine-learning algorithm to predict hypotension based on high-fidelity arterial pressure waveform analysis. Anesthesiology. 2018 Oct;129(4):663–74. doi: 10.1097/ALN.0000000000002300. https://pubs.asahq.org/anesthesiology/article-lookup/doi/10.1097/ALN.000... - DOI - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

Affiliations

Lifting Hospital Electronic Health Record Data Treasures: Challenges and Opportunities

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources