This is a preprint.
Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C)
- PMID: 34268525
- PMCID: PMC8282114
- DOI: 10.1101/2021.07.06.21259051
Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C)
Update in
-
Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).J Am Med Inform Assoc. 2022 Jul 12;29(8):1350-1365. doi: 10.1093/jamia/ocac045. J Am Med Inform Assoc. 2022. PMID: 35357487 Free PMC article.
Abstract
Objective: To evaluate whether synthetic data derived from a national COVID-19 data set could be used for geospatial and temporal epidemic analyses.
Materials and methods: Using an original data set (n=1,854,968 SARS-CoV-2 tests) and its synthetic derivative, we compared key indicators of COVID-19 community spread through analysis of aggregate and zip-code level epidemic curves, patient characteristics and outcomes, distribution of tests by zip code, and indicator counts stratified by month and zip code. Similarity between the data was statistically and qualitatively evaluated.
Results: In general, synthetic data closely matched original data for epidemic curves, patient characteristics, and outcomes. Synthetic data suppressed labels of zip codes with few total tests (mean=2.9±2.4; max=16 tests; 66% reduction of unique zip codes). Epidemic curves and monthly indicator counts were similar between synthetic and original data in a random sample of the most tested (top 1%; n=171) and for all unsuppressed zip codes (n=5,819), respectively. In small sample sizes, synthetic data utility was notably decreased.
Discussion: Analyses on the population-level and of densely-tested zip codes (which contained most of the data) were similar between original and synthetically-derived data sets. Analyses of sparsely-tested populations were less similar and had more data suppression.
Conclusion: In general, synthetic data were successfully used to analyze geospatial and temporal trends. Analyses using small sample sizes or populations were limited, in part due to purposeful data label suppression -an attribute disclosure countermeasure. Users should consider data fitness for use in these cases.
Figures
References
-
- The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction | medRxiv. https://www.medrxiv.org/content/10.1101/2021.01.12.21249511v3 (accessed 1 Mar 2021). - DOI
-
- HIPAA Privacy Rule and Its Impacts on Research. https://privacyruleandresearch.nih.gov/pr_08.asp (accessed 17 Mar 2021).
Publication types
Grants and funding
- U54 GM104938/GM/NIGMS NIH HHS/United States
- UL1 TR002649/TR/NCATS NIH HHS/United States
- UL1 TR003167/TR/NCATS NIH HHS/United States
- UL1 TR001433/TR/NCATS NIH HHS/United States
- UL1 TR001422/TR/NCATS NIH HHS/United States
- UL1 TR001860/TR/NCATS NIH HHS/United States
- U54 GM104942/GM/NIGMS NIH HHS/United States
- UL1 TR001420/TR/NCATS NIH HHS/United States
- UL1 TR001439/TR/NCATS NIH HHS/United States
- UL1 TR002243/TR/NCATS NIH HHS/United States
- UL1 TR001445/TR/NCATS NIH HHS/United States
- UL1 TR003096/TR/NCATS NIH HHS/United States
- UL1 TR002537/TR/NCATS NIH HHS/United States
- UL1 TR001412/TR/NCATS NIH HHS/United States
- T15 LM007442/LM/NLM NIH HHS/United States
- UL1 TR001872/TR/NCATS NIH HHS/United States
- UL1 TR001878/TR/NCATS NIH HHS/United States
- UL1 TR002529/TR/NCATS NIH HHS/United States
- UL1 TR001863/TR/NCATS NIH HHS/United States
- UL1 TR002494/TR/NCATS NIH HHS/United States
- UL1 TR002736/TR/NCATS NIH HHS/United States
- U54 GM115516/GM/NIGMS NIH HHS/United States
- UL1 TR002369/TR/NCATS NIH HHS/United States
- UL1 TR002541/TR/NCATS NIH HHS/United States
- UL1 TR002001/TR/NCATS NIH HHS/United States
- UL1 TR002538/TR/NCATS NIH HHS/United States
- U54 GM115458/GM/NIGMS NIH HHS/United States
- UL1 TR001442/TR/NCATS NIH HHS/United States
- UL1 TR002535/TR/NCATS NIH HHS/United States
- UL1 TR001866/TR/NCATS NIH HHS/United States
- UL1 TR001449/TR/NCATS NIH HHS/United States
- UL1 TR001453/TR/NCATS NIH HHS/United States
- UL1 TR002489/TR/NCATS NIH HHS/United States
- U54 GM104940/GM/NIGMS NIH HHS/United States
- UL1 TR003107/TR/NCATS NIH HHS/United States
- UL1 TR003015/TR/NCATS NIH HHS/United States
- UL1 TR002733/TR/NCATS NIH HHS/United States
- U24 TR002306/TR/NCATS NIH HHS/United States
- UL1 TR002003/TR/NCATS NIH HHS/United States
- UL1 TR001876/TR/NCATS NIH HHS/United States
- UL1 TR001436/TR/NCATS NIH HHS/United States
- UL1 TR002378/TR/NCATS NIH HHS/United States
- UL1 TR002384/TR/NCATS NIH HHS/United States
- UL1 TR002553/TR/NCATS NIH HHS/United States
- UL1 TR002389/TR/NCATS NIH HHS/United States
- UL1 TR001414/TR/NCATS NIH HHS/United States
- U54 GM104941/GM/NIGMS NIH HHS/United States
- UL1 TR002014/TR/NCATS NIH HHS/United States
- UL1 TR002550/TR/NCATS NIH HHS/United States
- UL1 TR002319/TR/NCATS NIH HHS/United States
- UL1 TR001855/TR/NCATS NIH HHS/United States
- UL1 TR001425/TR/NCATS NIH HHS/United States
- UL1 TR002373/TR/NCATS NIH HHS/United States
- UL1 TR002240/TR/NCATS NIH HHS/United States
- UL1 TR002556/TR/NCATS NIH HHS/United States
- UL1 TR003017/TR/NCATS NIH HHS/United States
- UL1 TR001998/TR/NCATS NIH HHS/United States
- UL1 TR001873/TR/NCATS NIH HHS/United States
- UL1 TR001881/TR/NCATS NIH HHS/United States
- UL1 TR002645/TR/NCATS NIH HHS/United States
- UL1 TR001450/TR/NCATS NIH HHS/United States
- UL1 TR002366/TR/NCATS NIH HHS/United States
- U54 GM115428/GM/NIGMS NIH HHS/United States
- UL1 TR002345/TR/NCATS NIH HHS/United States
- UL1 TR002377/TR/NCATS NIH HHS/United States
- U54 GM115677/GM/NIGMS NIH HHS/United States
- UL1 TR002544/TR/NCATS NIH HHS/United States
- UL1 TR003098/TR/NCATS NIH HHS/United States
- UL1 TR001430/TR/NCATS NIH HHS/United States
- UL1 TR003142/TR/NCATS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Miscellaneous