Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul:131:104110.
doi: 10.1016/j.jbi.2022.104110. Epub 2022 Jun 6.

The IeDEA harmonist data toolkit: A data quality and data sharing solution for a global HIV research consortium

Affiliations

The IeDEA harmonist data toolkit: A data quality and data sharing solution for a global HIV research consortium

Judith T Lewis et al. J Biomed Inform. 2022 Jul.

Abstract

We describe the design, implementation, and impact of a data harmonization, data quality checking, and dynamic report generation application in an international observational HIV research network. The IeDEA Harmonist Data Toolkit is a web-based application written in the open source programming language R, employs the R/Shiny and RMarkdown packages, and leverages the REDCap data collection platform for data model definition and user authentication. The Toolkit performs data quality checks on uploaded datasets, checks for conformance with the network's common data model, displays the results both interactively and in downloadable reports, and stores approved datasets in secure cloud storage for retrieval by the requesting investigator. Including stakeholders and users in the design process was key to the successful adoption of the application. A survey of regional data managers as well as initial usage metrics indicate that the Toolkit saves time and results in improved data quality, with a 61% mean reduction in the number of error records in a dataset. The generalized application design allows the Toolkit to be easily adapted to other research networks.

Keywords: Biomedical informatics; Data harmonization; Data quality; Global health; HIV.

PubMed Disclaimer

Figures

Figure A1.
Figure A1.
Each panel tracks the number of errors detected in each iteration of uploading and checking a dataset for a single users’ response to a specific data request. On the final iteration, datasets were transferred to the investigator who requested the data.
Figure 1:
Figure 1:
IeDEA regions and participating sites
Figure 2.
Figure 2.
Abstracting data model details in REDCap. Data quality checks based on these details automatically include new variables and codes.
Figure 3.
Figure 3.
Collaborative design timeline for IeDEA Harmonist Data Toolkit
Figure 4.
Figure 4.
Harmonist Data Toolkit system architecture and communication with REDCap
Figure 5.
Figure 5.
Report visualizations useful in data quality assessment: (a) Example of histograms of enrollments, clinic visits, lab tests, ART medication initiation, and disease diagnoses by date for each site. Investigators can spot unusual trends, such as the drop off in documented clinic visits after 2015 for this example site. (b) Heat maps of patient representation in data tables (e.g., loss to follow-up from clinic [LTFU], visits, CD4 cell count lab results, HIV viral load lab results, and antiretroviral therapy [ART]) draw attention to gaps in reporting, such as the lack of any clinic visit data from “Site 5” in the example above.
Figure 6.
Figure 6.
Harmonist Data Toolkit workflow overview
Figure 7.
Figure 7.
Screenshot of Step 2 of the Harmonist Data Toolkit.
Figure 8.
Figure 8.
Results of REDCap Survey of IeDEA data managers after the first year of Toolkit use in IeDEA. Data managers compared Toolkit workflow with their previous methods of data quality checking and data sharing for IeDEA multiregional studies.
Figure 9.
Figure 9.
Example analysis of dataset errors in a series of uploads and revisions by a single data manager for a specific data request. On the final iteration, the dataset was transferred to the investigator who requested the data. See the Appendix for additional graphs.
Figure 10.
Figure 10.
Error types found in initial uploads as compared with final uploads among datasets that were checked with the Toolkit multiple times and revised before submission

References

    1. International epidemiology Databases to Evaluate AIDS, (n.d.). https://www.iedea.org/ (accessed March 2, 2021).
    1. Huser V, DeFalco FJ, Schuemie M, Ryan PB, Shang N, Velez M, Park RW, Boyce RD, Duke J, Khare R, Utidjian L, Bailey C, Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Datasets, EGEMs (Generating Evid. Methods to Improv. Patient Outcomes). 4 (2016) 24. 10.13063/2327-9214.1239. - DOI - PMC - PubMed
    1. Hersh WR, Cimino J, Payne PRO, Embi P, Logan J, Weiner M, V Bernstam E, Lehmann H, Hripcsak G, Hartzog T, Saltz J, Recommendations for the use of operational electronic health record data in comparative effectiveness research, EGEMS; (Washington, DC: ). 1 (2013) 1018. 10.13063/2327-9214.1018. - DOI - PMC - PubMed
    1. Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw S-T, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L, A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data., EGEMS; (Washington, DC: ). 4 (2016) 1244. 10.13063/2327-9214.1244. - DOI - PMC - PubMed
    1. Callahan TJ, Bauck AE, Bertoch D, Brown J, Khare R, Ryan PB, Staab J, Zozus MN, Kahn MG, A Comparison of Data Quality Assessment Checks in Six Data Sharing Networks, EGEMs (Generating Evid. Methods to Improv. Patient Outcomes). 5 (2017) 8. 10.5334/egems.223. - DOI - PMC - PubMed

Publication types