. 2023 May 15;23(1):94.

doi: 10.1186/s12911-023-02195-3.

FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital

Marcel Parciak^#^{1

2

3}, Markus Suhr^#^{1

4}, Christian Schmidt¹, Caroline Bönisch¹, Benjamin Löhnhardt¹, Dorothea Kesztyüs⁵, Tibor Kesztyüs¹

Affiliations

¹ Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany.
² University MS Center, Biomedical Research Institute (BIOMED), Hasselt University, Agoralaan Building C, 3590, Diepenbeek, Belgium.
³ Data Science Institute (DSI), Hasselt University, Agoralaan Building D, 3590, Diepenbeek, Belgium.
⁴ NextLytics AG, Kapellenstrasse 37, 65719, Hofheim Am Taunus, Germany.
⁵ Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany. dorothea.kesztyues@med.uni-goettingen.de.

^# Contributed equally.

PMID: 37189148
PMCID: PMC10186636
DOI: 10.1186/s12911-023-02195-3

FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital

Marcel Parciak et al. BMC Med Inform Decis Mak. 2023.

. 2023 May 15;23(1):94.

doi: 10.1186/s12911-023-02195-3.

Authors

Marcel Parciak^#^{1

2

3}, Markus Suhr^#^{1

4}, Christian Schmidt¹, Caroline Bönisch¹, Benjamin Löhnhardt¹, Dorothea Kesztyüs⁵, Tibor Kesztyüs¹

Affiliations

¹ Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany.
² University MS Center, Biomedical Research Institute (BIOMED), Hasselt University, Agoralaan Building C, 3590, Diepenbeek, Belgium.
³ Data Science Institute (DSI), Hasselt University, Agoralaan Building D, 3590, Diepenbeek, Belgium.
⁴ NextLytics AG, Kapellenstrasse 37, 65719, Hofheim Am Taunus, Germany.
⁵ Department of Medical Informatics, University Medical Center Göttingen, Von-Siebold-Straße 3, 37075, Göttingen, Germany. dorothea.kesztyues@med.uni-goettingen.de.

^# Contributed equally.

PMID: 37189148
PMCID: PMC10186636
DOI: 10.1186/s12911-023-02195-3

Abstract

Background: Secondary use of routine medical data is key to large-scale clinical and health services research. In a maximum care hospital, the volume of data generated exceeds the limits of big data on a daily basis. This so-called "real world data" are essential to complement knowledge and results from clinical trials. Furthermore, big data may help in establishing precision medicine. However, manual data extraction and annotation workflows to transfer routine data into research data would be complex and inefficient. Generally, best practices for managing research data focus on data output rather than the entire data journey from primary sources to analysis. To eventually make routinely collected data usable and available for research, many hurdles have to be overcome. In this work, we present the implementation of an automated framework for timely processing of clinical care data including free texts and genetic data (non-structured data) and centralized storage as Findable, Accessible, Interoperable, Reusable (FAIR) research data in a maximum care university hospital.

Methods: We identify data processing workflows necessary to operate a medical research data service unit in a maximum care hospital. We decompose structurally equal tasks into elementary sub-processes and propose a framework for general data processing. We base our processes on open-source software-components and, where necessary, custom-built generic tools.

Results: We demonstrate the application of our proposed framework in practice by describing its use in our Medical Data Integration Center (MeDIC). Our microservices-based and fully open-source data processing automation framework incorporates a complete recording of data management and manipulation activities. The prototype implementation also includes a metadata schema for data provenance and a process validation concept. All requirements of a MeDIC are orchestrated within the proposed framework: Data input from many heterogeneous sources, pseudonymization and harmonization, integration in a data warehouse and finally possibilities for extraction or aggregation of data for research purposes according to data protection requirements.

Conclusion: Though the framework is not a panacea for bringing routine-based research data into compliance with FAIR principles, it provides a much-needed possibility to process data in a fully automated, traceable, and reproducible manner.

Keywords: Automated medical data processing; Electronic health record; Maximum care hospital; Medical data integration center; Medical data reuse; Medical informatics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
High level view of the logical data flow at the UMG-MeDIC, depicting the different stages of the Extract-Transform-Load process. The information (i.e. healthcare data from the University Medical Center Göttingen) is extracted from the data sources and pooled in a data lake. Within the transformation and loading step the data is pushed to the data warehouse and is then provided to the target systems in the required format. UMG-MeDIC University Medical Center Göttingen-Medical Data Integration Center, ETL Extract-Transform-Load

**Fig. 2**
Generic schema of an atomic Extract Transform Load task and metadata capture sub process. Process control flow: black lines left to right; data flow: blue outline; metadata flow: green outline. The process is started with a “perform task” which interacts with external resources and pulls data from different sources. Subsequently the process flow enables the recording of metadata that is written in a separate metadata storage

**Fig. 3**
System architecture of the implementation. The complete process is, as described in the text, divided in tasks, which are controlled by the ActiveWorkflow system. ETL Extract-Transform-Load, GUI Graphical User interface

**Fig. 4**
ETL-Monitor: Extract-Transform-Load-processes are displayed with their respective status (success, fail). Here, for example, the work flow of importing microbiology data is displayed. Clicking on a specific process provides detailed information. If the process failed, the last step that was successful is displayed. UMG-MeDIC University Medical Center Göttingen-Medical Data Integration Center, DWH Data Warehouse

**Fig. 5**
Schematic representation of data flow between different storage systems. The example workflow shows transfer of laboratory result information through common stages of data processing and storage at UMG-MeDIC. The process starts with the HL7 file stream from the clinical systems, where the observation results are stored in a ORU message and the corresponding process metadata is collected. The information is pooled into a raw data lake. Subsequently the information is pseudonymized and transferred in a pseudonymized data lake. After preprocessing the information is stored in the data warehouse. In a final step FHIR resources based on the data are created and stored in a HL7 FHIR server. UMG-MeDIC University Medical Center Göttingen-Medical Data Integration Center, HL7 Health Level 7, ORU HL7 Observation Result, FHIR Fast Healthcare Interoperability Resources, DWH Data Warehouse

**Fig. 6**
Graphical representation of a standard ETL Workflow. ETL Extract-Transform-Load, CDSTAR Common Data Storage Architecture

See this image and copyright information in PMC

References

1. Martin-Sanchez FJ, Aguiar-Pulido V, Lopez-Campos GH, Peek N, Sacchi L. Secondary Use and Analysis of Big Data Collected for Patient Care. Yearb Med Inform. 2017;26(1):28–37. doi: 10.15265/IY-2017-008. - DOI - PMC - PubMed
1. Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. Comment: The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:1–9. doi: 10.1038/sdata.2016.18. - DOI - PMC - PubMed
1. Cao Y, Jones C, Cuevas-Vicenttín V, Jones MB, Ludäscher B, McPhillips T, et al. DataONE: A Data Federation with Provenance Support. In: Mattoso M, Glavic B, et al., editors. Provenance and Annotation of Data and Processes IPAW 2016 Lecture Notes in Computer Science. Springer Cham; 2016. pp. 230–4.
1. Ohno-Machado L, Sansone SA, Alter G, Fore I, Grethe J, Xu H, et al. Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet. 2017;49(6):816–819. doi: 10.1038/ng.3864. - DOI - PMC - PubMed
1. Holub P, Kohlmayer F, Prasser F, Mayrhofer MT, Schlünder I, Martin GM, et al. Enhancing Reuse of Data and Biological Material in Medical Research: From FAIR to FAIR-Health. Biopreserv Biobank. 2018;16(2):97–105. doi: 10.1089/bio.2017.0110. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital

Affiliations

FAIRness through automation: development of an automated medical data integration infrastructure for FAIR health data in a maximum care university hospital

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources