The IeDEA harmonist data toolkit: A data quality and data sharing solution for a global HIV research consortium
- PMID: 35680074
- PMCID: PMC9893518
- DOI: 10.1016/j.jbi.2022.104110
The IeDEA harmonist data toolkit: A data quality and data sharing solution for a global HIV research consortium
Abstract
We describe the design, implementation, and impact of a data harmonization, data quality checking, and dynamic report generation application in an international observational HIV research network. The IeDEA Harmonist Data Toolkit is a web-based application written in the open source programming language R, employs the R/Shiny and RMarkdown packages, and leverages the REDCap data collection platform for data model definition and user authentication. The Toolkit performs data quality checks on uploaded datasets, checks for conformance with the network's common data model, displays the results both interactively and in downloadable reports, and stores approved datasets in secure cloud storage for retrieval by the requesting investigator. Including stakeholders and users in the design process was key to the successful adoption of the application. A survey of regional data managers as well as initial usage metrics indicate that the Toolkit saves time and results in improved data quality, with a 61% mean reduction in the number of error records in a dataset. The generalized application design allows the Toolkit to be easily adapted to other research networks.
Keywords: Biomedical informatics; Data harmonization; Data quality; Global health; HIV.
Copyright © 2022 The Author(s). Published by Elsevier Inc. All rights reserved.
Figures
References
-
- International epidemiology Databases to Evaluate AIDS, (n.d.). https://www.iedea.org/ (accessed March 2, 2021).
-
- Huser V, DeFalco FJ, Schuemie M, Ryan PB, Shang N, Velez M, Park RW, Boyce RD, Duke J, Khare R, Utidjian L, Bailey C, Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Datasets, EGEMs (Generating Evid. Methods to Improv. Patient Outcomes). 4 (2016) 24. 10.13063/2327-9214.1239. - DOI - PMC - PubMed
-
- Hersh WR, Cimino J, Payne PRO, Embi P, Logan J, Weiner M, V Bernstam E, Lehmann H, Hripcsak G, Hartzog T, Saltz J, Recommendations for the use of operational electronic health record data in comparative effectiveness research, EGEMS; (Washington, DC: ). 1 (2013) 1018. 10.13063/2327-9214.1018. - DOI - PMC - PubMed
-
- Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw S-T, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L, A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data., EGEMS; (Washington, DC: ). 4 (2016) 1244. 10.13063/2327-9214.1244. - DOI - PMC - PubMed
