Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 26;9(1):e87.
doi: 10.1017/cts.2025.52. eCollection 2025.

Best practices for clinical trials data harmonization and sharing on NHLBI bioData catalyst (BDC) learned from CONNECTS network COVID-19 studies

Affiliations

Best practices for clinical trials data harmonization and sharing on NHLBI bioData catalyst (BDC) learned from CONNECTS network COVID-19 studies

Jeran K Stratford et al. J Clin Transl Sci. .

Abstract

The need for collaborative and transparent sharing of COVID-19 clinical trial and large-scale observational study data to accelerate scientific discovery and inform clinical practice is critical. Responsible data-sharing requires addressing challenges associated with data privacy and confidentiality, data linkage, data quality, variable harmonization, data formats, and comprehensive metadata documentation to produce a high-quality, contextually rich, findable, accessible, interoperable, and reusable (FAIR) dataset. This communication explores the experiences and lessons learned from sharing National Heart Lung and Blood Institute (NHLBI) COVID-19 clinical trial (including adaptive platform trials) and cohort study datasets through the NHLBI BioData Catalyst® (BDC) ecosystem, focusing on the challenges and successes of harmonizing these datasets for broader research use. Our findings highlight the importance of establishing standardized data formats, adopting common data elements and creating and maintaining robust data governance structures that address common challenges (i.e., data privacy and data-sharing limitations resulting from informed consent). These efforts resulted in a set of comprehensive and interoperable datasets from 5 clinical trials and 13 cohort studies that will enable downstream reuse in analyses and collaborations. The principles and strategies outlined, derived through experience with consortia data, can lay the groundwork for advancing collaborative and efficient data sharing.

Keywords: BioData Catalyst; COVID-19; Data harmonization; clinical trials.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

Figure 1.
Figure 1.
A. CONNECTS common data elements development and utilization. Many CONNECTS studies were ongoing (blue lines) prior to development and initial publication of the CONNECTS CDEs in June 2021 (yellow flag). Therefore, concentrated time for retrospective harmonization (solid green lines) was required to align study data with the CONNECTS CDEs to maximize dataset interoperability. In part, CDE adoption during study design coupled with concurrent data collection and intermittent harmonization (dashed green line) during ACTIV4-HT contributed to the reduction in time between study completion and dataset release (red stars). B. CONNECTS study variables mapped to CONNECTS CDEs. The count of mapping levels assigned to the study variable(s)/CDE pairing across CONNECTS studies was evaluated and visualized. An “Identical” mapping (blue) signifies study data was collected exactly as recommended by the NHLBI COVID-19 CDE. A “Comparable” mapping (orange) means that the study variable and NHLBI COVID-19 CDE are conceptually similar but differ in phrasing or response options. A “Related” mapping (gray) indicates that the study variable and the NHLBI COVDI-19 CDE covers a similar topic, but the mapping relationship is uncertain. ACTIV4-HT was the only study to adopt CONNECTS CDEs during study design, which greatly increased the number of “Identical” mappings, thus maximizing interoperability. Please note that ACTIV4a v1.0, v1.1, and v1.2 are different trial arms (drugs), not different versions of the same trial arm (drug).
Figure 2.
Figure 2.
BDC submission workflow. Data generators who submitted datasets to BDC completed a multistep process involving multiple systems. The figure outlines tasks for this data generator led workflow for each step, with references to the relevant submission forms. The outcomes produced at each step that enable advancing to the next phase are outlined. dbGaP = database of genotypes and phenotypes; QC = quality control; BDC = NHLBI BioData Catalyst®; DMC = data management core; a. bdcatalystdatasharing@nih.gov, b. nhlbigeneticdata@nhlbi.nih.gov.
Figure 3.
Figure 3.
ACTIV4a adaptive platform trial data collection timelines. Adaptive platform trials allow for flexibility for interventions to enter or leave the platform based on a predefined decision algorithm. This flexibility results in staggered completion of longitudinal data collection (separate lock dates for each intervention). To make data available as soon as possible while balancing the effort required for data submission, harmonized datasets that are completed at the same time are aggregated (colors) into a single data release. One impact of this approach is the need to access multiple releases to obtain all data for one of the domains (P2Y12 for severe baseline disease). *Release 2 includes updated Release 1 data and is preferentially recommended for analysis. EMR = electronic medical records; SGLT2 = sodium-glucose cotransporter-2, criza = crizanlizumab.

Similar articles

References

    1. Collins FS, Stoffels P. Accelerating COVID-19 therapeutic interventions and vaccines (ACTIV): an unprecedented partnership for unprecedented times. JAMA. 2020;323(24):2455–2457. doi: 10.1001/jama.2020.8920. - DOI - PubMed
    1. Oelsner EC, Krishnaswamy A, Balte PP, et al. Collaborative cohort of cohorts for COVID-19 research (C4R) study: study design. Am J Epidemiol. 2022;191(7):1153–1173. doi: 10.1093/aje/kwac032. - DOI - PMC - PubMed
    1. National Institutes of Health. Final NIH Policy for Data Management and Sharing. National Institutes of Health. (https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html) Accessed 14 Feb, 2025.
    1. Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. - DOI - PMC - PubMed
    1. Weissman A, Cheng A, Mainor A, et al. Development and implementation of the national heart, lung, and blood institute COVID-19 common data elements. J Clin Transl Sci. 2022;6(1):e142. doi: 10.1017/cts.2022.466. - DOI - PMC - PubMed

LinkOut - more resources