Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative
- PMID: 34590684
- PMCID: PMC8500110
- DOI: 10.1093/jamia/ocab217
Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative
Abstract
Objective: In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations.
Materials and methods: We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements.
Results: Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback.
Discussion: We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate.
Conclusion: By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require.
Keywords: COVID-19; data accuracy; electronic health records.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Figures





References
-
- National COVID Cohort Collaborative. N3C Cohort Exploration. https://covid.cd2h.org/dashboard/ Accessed Jun 28, 2021.
-
- NCATS. NIH COVID-19 Data Warehouse Data Transfer Agreement. 2020. https://ncats.nih.gov/files/NCATS_Data_Transfer_Agreement_05-11-2020_Upd...
Publication types
MeSH terms
Grants and funding
- UL1 TR001998/TR/NCATS NIH HHS/United States
- UL1 TR002535/TR/NCATS NIH HHS/United States
- UL1 TR002649/TR/NCATS NIH HHS/United States
- UL1 TR001433/TR/NCATS NIH HHS/United States
- UL1 TR001422/TR/NCATS NIH HHS/United States
- UL1 TR002553/TR/NCATS NIH HHS/United States
- UL1 TR002369/TR/NCATS NIH HHS/United States
- UL1 TR002345/TR/NCATS NIH HHS/United States
- UL1 TR003142/TR/NCATS NIH HHS/United States
- UL1 TR002537/TR/NCATS NIH HHS/United States
- UL1 TR001445/TR/NCATS NIH HHS/United States
- U24TR002306/TR/NCATS NIH HHS/United States
- UL1 TR001425/TR/NCATS NIH HHS/United States
- U54 GM104938/GM/NIGMS NIH HHS/United States
- UL1 TR002544/TR/NCATS NIH HHS/United States
- U54 GM115516/GM/NIGMS NIH HHS/United States
- UL1 TR002003/TR/NCATS NIH HHS/United States
- UL1 TR001876/TR/NCATS NIH HHS/United States
- UL1 TR002538/TR/NCATS NIH HHS/United States
- UL1 TR001881/TR/NCATS NIH HHS/United States
- U54 GM115677/GM/NIGMS NIH HHS/United States
- UL1 TR003107/TR/NCATS NIH HHS/United States
- UL1 TR001414/TR/NCATS NIH HHS/United States
- UL1 TR001863/TR/NCATS NIH HHS/United States
- U54 GM115428/GM/NIGMS NIH HHS/United States
- UL1 TR002736/TR/NCATS NIH HHS/United States
- UL1 TR003098/TR/NCATS NIH HHS/United States
- UL1 TR002541/TR/NCATS NIH HHS/United States
- UL1 TR002001/TR/NCATS NIH HHS/United States
- U54 GM115458/GM/NIGMS NIH HHS/United States
- UL1 TR002378/TR/NCATS NIH HHS/United States
- UL1 TR001442/TR/NCATS NIH HHS/United States
- UL1 TR002494/TR/NCATS NIH HHS/United States
- UL1 TR002645/TR/NCATS NIH HHS/United States
- UL1 TR001453/TR/NCATS NIH HHS/United States
- UL1 TR002489/TR/NCATS NIH HHS/United States
- UL1 TR001420/TR/NCATS NIH HHS/United States
- U54 GM104940/GM/NIGMS NIH HHS/United States
- UL1 TR003015/TR/NCATS NIH HHS/United States
- UL1 TR003017/TR/NCATS NIH HHS/United States
- UL1 TR001860/TR/NCATS NIH HHS/United States
- UL1 TR002366/TR/NCATS NIH HHS/United States
- UL1 TR002377/TR/NCATS NIH HHS/United States
- UL1 TR002733/TR/NCATS NIH HHS/United States
- UL1 TR002550/TR/NCATS NIH HHS/United States
- UL1 TR001439/TR/NCATS NIH HHS/United States
- UL1 TR003096/TR/NCATS NIH HHS/United States
- UL1 TR002529/TR/NCATS NIH HHS/United States
- UL1 TR001857/TR/NCATS NIH HHS/United States
- UL1 TR001855/TR/NCATS NIH HHS/United States
- K23 DK059311/DK/NIDDK NIH HHS/United States
- UL1 TR001878/TR/NCATS NIH HHS/United States
- UL1 TR002319/TR/NCATS NIH HHS/United States
- U54 GM104941/GM/NIGMS NIH HHS/United States
- U24 TR002306/TR/NCATS NIH HHS/United States
- UL1 TR001436/TR/NCATS NIH HHS/United States
- UL1 TR001872/TR/NCATS NIH HHS/United States
- UL1 TR002389/TR/NCATS NIH HHS/United States
- UL1 TR002014/TR/NCATS NIH HHS/United States
- UL1 TR001412/TR/NCATS NIH HHS/United States
- UL1 TR002373/TR/NCATS NIH HHS/United States
- UL1 TR002240/TR/NCATS NIH HHS/United States
- UL1 TR002556/TR/NCATS NIH HHS/United States
- UL1 TR001449/TR/NCATS NIH HHS/United States
- UL1 TR002384/TR/NCATS NIH HHS/United States
- UL1 TR001866/TR/NCATS NIH HHS/United States
- UL1 TR001450/TR/NCATS NIH HHS/United States
- UL1 TR001873/TR/NCATS NIH HHS/United States
- U54 GM104942/GM/NIGMS NIH HHS/United States
- UL1 TR003167/TR/NCATS NIH HHS/United States
- UL1 TR002243/TR/NCATS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Medical