A General Primer for Data Harmonization
- PMID: 38297013
- PMCID: PMC10831085
- DOI: 10.1038/s41597-024-02956-3
A General Primer for Data Harmonization
Abstract
Data harmonization is an important method for combining or transforming data. To date however, articles about data harmonization are field-specific and highly technical, making it difficult for researchers to derive general principles for how to engage in and contextualize data harmonization efforts. This commentary provides a primer on the tradeoffs inherent in data harmonization for researchers who are considering undertaking such efforts or seek to evaluate the quality of existing ones. We derive this guidance from the extant literature and our own experience in harmonizing data for the emergent and important new field of COVID-19 public health and safety measures (PHSM).
Conflict of interest statement
The authors declare no competing interests.
Figures
Similar articles
-
Effect of data harmonization of multicentric dataset in ASD/TD classification.Brain Inform. 2023 Nov 25;10(1):32. doi: 10.1186/s40708-023-00210-x. Brain Inform. 2023. PMID: 38006422 Free PMC article.
-
Toward Rigorous Data Harmonization in Cancer Epidemiology Research: One Approach.Am J Epidemiol. 2015 Dec 15;182(12):1033-8. doi: 10.1093/aje/kwv133. Epub 2015 Nov 20. Am J Epidemiol. 2015. PMID: 26589709 Free PMC article. Review.
-
Conceptual comparison of constructs as first step in data harmonization: Parental sensitivity, child temperament, and social support as illustrations.MethodsX. 2022 Oct 26;9:101889. doi: 10.1016/j.mex.2022.101889. eCollection 2022. MethodsX. 2022. PMID: 36354308 Free PMC article.
-
Harmonization of Quantitative Parenchymal Enhancement in T1 -Weighted Breast MRI.J Magn Reson Imaging. 2020 Nov;52(5):1374-1382. doi: 10.1002/jmri.27244. Epub 2020 Jun 3. J Magn Reson Imaging. 2020. PMID: 32491246 Free PMC article.
-
Harmonization of Brain Diffusion MRI: Concepts and Methods.Front Neurosci. 2020 May 6;14:396. doi: 10.3389/fnins.2020.00396. eCollection 2020. Front Neurosci. 2020. PMID: 32435181 Free PMC article. Review.
Cited by
-
psHarmonize: Facilitating reproducible large-scale pre-statistical data harmonization and documentation in R.Patterns (N Y). 2024 Jun 14;5(8):101003. doi: 10.1016/j.patter.2024.101003. eCollection 2024 Aug 9. Patterns (N Y). 2024. PMID: 39233692 Free PMC article.
-
Unifying community-wide whole-brain imaging datasets enables robust automated neuron identification and reveals determinants of neuron positioning in C. elegans.bioRxiv [Preprint]. 2024 Jun 29:2024.04.28.591397. doi: 10.1101/2024.04.28.591397. bioRxiv. 2024. Update in: Cell Rep Methods. 2025 Jan 27;5(1):100964. doi: 10.1016/j.crmeth.2024.100964. PMID: 38746302 Free PMC article. Updated. Preprint.
-
Consortium for the Holistic Assessment of Risk in Transplant: Harmonizing Data for Research, Transparency, and Equity.Ann Surg. 2025 Mar 1;281(3):373-375. doi: 10.1097/SLA.0000000000006410. Epub 2024 Jun 20. Ann Surg. 2025. PMID: 38899463 No abstract available.
-
Data harmonization for the analysis of personalized treatment of psychosis with metacognitive training.Sci Rep. 2025 Mar 24;15(1):10159. doi: 10.1038/s41598-025-94815-3. Sci Rep. 2025. PMID: 40128308 Free PMC article.
-
ItemComplex: A Python-based visualization framework for ex-post organization and integration of large language-based datasets.Eur Psychiatry. 2025 May 26;68(1):e75. doi: 10.1192/j.eurpsy.2025.2457. Eur Psychiatry. 2025. PMID: 40415539 Free PMC article.
References
-
- Demchenko, Y., Zhao, Z., Grosso, P., Wibisono, A. & De Laat, C. Addressing big data challenges for scientific data infrastructure. In 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, 614–617, 10.1109/CloudCom.2012.6427494 (IEEE, 2012).
-
- Ruggles, S. The minnesota population center data integration projects: Challenges of harmonizing census microdata across time and place. In In Proceedings of the American Statistical Association, Government Statistics Section, 1405–1415 (Citeseer, 2006).
-
- Elshawi R, Sakr S, Talia D, Trunfio P. Big data systems meet machine learning challenges: towards big data science as a service. Big data research. 2018;14:1–11. doi: 10.1016/j.bdr.2018.04.004. - DOI
-
- Solt F. The standardized world income inequality database. Social science quarterly. 2016;97:1267–1281. doi: 10.1111/ssqu.12295. - DOI
-
- Solt F. 2009. The standardized world income inequality database v1-v7”. Harvard Dataverse, V20. - DOI
Grants and funding
- 101016233/EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
- 101016233/EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
- 101016233/EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
- 101016233/EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
- 832-06g/National Council for Eurasian and East European Research (NCEEER)
LinkOut - more resources
Full Text Sources