Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 30;6(1):1680.
doi: 10.23889/ijpds.v6i1.1680. eCollection 2021.

Data harmonization and data pooling from cohort studies: a practical approach for data management

Affiliations

Data harmonization and data pooling from cohort studies: a practical approach for data management

Kamala Adhikari et al. Int J Popul Data Sci. .

Abstract

Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.

Keywords: cohort studies; comparable dataset; data harmonization; data pooling or combination; harmonization strategies.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare that they have no competing interests.

References

    1. Roberts G, Binder D. Analyses Based on Combining Similar Information from Multiple Surveys. Section on Survey Research Methods Joint Statistical Meetings (JSM); 2009. p.2138–47.
    1. Rao SR, Graubard BI, Schmid CH, Morton SC, Louis TA, Zaslavsky AM, et al.. Meta-analysis of survey data: application to health services research. Health Services and Outcomes Research Methodology. 2008;8(2):98–114.
    1. Fortier I, Doiron D, Burton P, Raina P. Invited commentary: consolidating data harmonization–how to obtain quality and applicability? Am J Epidemiol. 2011;174(3):261–4; author reply 5-6. - PubMed
    1. Fortier I, Doiron D, Wolfson C, Raina P. Harmonizing data for collaborative research on aging: Why should we foster such an agenda? Canadian Journal of Aging. 2012;31:95–99. - PubMed
    1. Fortier I, Doiron D, Little J, Ferretti V, L’Heureux F, Stolk RP, et al.. Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies. Int J Epidemiol. 2011; 40:1314–1328. - PMC - PubMed

LinkOut - more resources