Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;12(4):826-835.
doi: 10.1055/s-0041-1733847. Epub 2021 Aug 25.

Linking a Consortium-Wide Data Quality Assessment Tool with the MIRACUM Metadata Repository

Affiliations

Linking a Consortium-Wide Data Quality Assessment Tool with the MIRACUM Metadata Repository

Lorenz A Kapsner et al. Appl Clin Inform. 2021 Aug.

Abstract

Background: Many research initiatives aim at using data from electronic health records (EHRs) in observational studies. Participating sites of the German Medical Informatics Initiative (MII) established data integration centers to integrate EHR data within research data repositories to support local and federated analyses. To address concerns regarding possible data quality (DQ) issues of hospital routine data compared with data specifically collected for scientific purposes, we have previously presented a data quality assessment (DQA) tool providing a standardized approach to assess DQ of the research data repositories at the MIRACUM consortium's partner sites.

Objectives: Major limitations of the former approach included manual interpretation of the results and hard coding of analyses, making their expansion to new data elements and databases time-consuming and error prone. We here present an enhanced version of the DQA tool by linking it to common data element definitions stored in a metadata repository (MDR), adopting the harmonized DQA framework from Kahn et al and its application within the MIRACUM consortium.

Methods: Data quality checks were consequently aligned to a harmonized DQA terminology. Database-specific information were systematically identified and represented in an MDR. Furthermore, a structured representation of logical relations between data elements was developed to model plausibility-statements in the MDR.

Results: The MIRACUM DQA tool was linked to data element definitions stored in a consortium-wide MDR. Additional databases used within MIRACUM were linked to the DQ checks by extending the respective data elements in the MDR with the required information. The evaluation of DQ checks was automated. An adaptable software implementation is provided with the R package DQAstats.

Conclusion: The enhancements of the DQA tool facilitate the future integration of new data elements and make the tool scalable to other databases and data models. It has been provided to all ten MIRACUM partners and was successfully deployed and integrated into their respective data integration center infrastructure.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Fig. 1
Fig. 1
DQA tool integration in the MIRACUM data integration center (DIC) infrastructure (schema). Within the DIC, pseudonymized data are transferred by ETL processes from the source systems via a FHIR gateway into the target research data repositories. Each combination of these ETL steps can be analyzed separately by the DQA tool. The solid lines depict the comparison between the source system and the FHIR gateway. The dark dashed lines show the comparison between the FHIR gateway and the target system. The gray dashed lines present the comparison of the source system and the target system. ETL, extract-transform-load. DQA, data quality assessment.

References

    1. Biomedical Informatics Research Network . Helmer K G, Ambite J L, Ames J. Enabling collaborative research using the Biomedical Informatics Research Network (BIRN) J Am Med Inform Assoc. 2011;18(04):416–422. - PMC - PubMed
    1. Holve E, Segal C, Lopez M H, Rein A, Johnson B H.The Electronic Data Methods (EDM) forum for comparative effectiveness research (CER) Med Care 201250(suppl):S7–S10. - PubMed
    1. McMurry A J, Murphy S N, MacFadden D. SHRINE: enabling nationally scalable multi-site disease studies. PLoS ONE. 2013;8 03:e55811. - PMC - PubMed
    1. Hripcsak G, Duke J D, Shah N H. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform. 2015;216(216):574–578. - PMC - PubMed
    1. Juárez D, Schmidt E E, Stahl-Toyota S, Ückert F, Lablans M.A generic method and implementation to evaluate and improve data quality in distributed research networks Methods Inf Med 201958(2-03):86–93. - PubMed

Publication types