Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Dec 15;182(12):1033-8.
doi: 10.1093/aje/kwv133. Epub 2015 Nov 20.

Toward Rigorous Data Harmonization in Cancer Epidemiology Research: One Approach

Review

Toward Rigorous Data Harmonization in Cancer Epidemiology Research: One Approach

Betsy Rolland et al. Am J Epidemiol. .

Abstract

Cancer epidemiologists have a long history of combining data sets in pooled analyses, often harmonizing heterogeneous data from multiple studies into 1 large data set. Although there are useful websites on data harmonization with recommendations and support, there is little research on best practices in data harmonization; each project conducts harmonization according to its own internal standards. The field would be greatly served by charting the process of data harmonization to enhance the quality of the harmonized data. Here, we describe the data harmonization process utilized at the Fred Hutchinson Cancer Research Center (Seattle, Washington) by the coordinating centers of several research projects. We describe a 6-step harmonization process, including: 1) identification of questions the harmonized data set is required to answer; 2) identification of high-level data concepts to answer those questions; 3) assessment of data availability for data concepts; 4) development of common data elements for each data concept; 5) mapping and transformation of individual data points to common data elements; and 6) quality-control procedures. Our aim here is not to claim a "correct" way of doing data harmonization but to encourage others to describe their processes in order that we can begin to create rigorous approaches. We also propose a research agenda around this issue.

Keywords: cancer epidemiology; data harmonization; data pooling.

PubMed Disclaimer

References

    1. Thompson A. Thinking big: large-scale collaborative research in observational epidemiology. Eur J Epidemiol. 2009;2412:727–731. - PubMed
    1. Rolland B, Lee CP. Beyond trust and reliability: reusing data in collaborative cancer epidemiology research. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work. New York, NY: Association for Computing Machinery; 2013:435–444.
    1. Fortier I, Burton PR, Robson PJ et al. . Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies. Int J Epidemiol. 2010;395:1383–1393. - PMC - PubMed
    1. Fortier I, Doiron D, Little J et al. . Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies. Int J Epidemiol. 2011;405:1314–1328. - PMC - PubMed
    1. Doiron D, Burton P, Marcon Y et al. . Data harmonization and federated analysis of population-based studies: the BioSHaRE project. Emerg Themes Epidemiol. 2013;101:12. - PMC - PubMed

Publication types