Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov 21;10(1):12.
doi: 10.1186/1742-7622-10-12.

Data harmonization and federated analysis of population-based studies: the BioSHaRE project

Affiliations

Data harmonization and federated analysis of population-based studies: the BioSHaRE project

Dany Doiron et al. Emerg Themes Epidemiol. .

Abstract

Background: Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses.

Methods: Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study's questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis.

Results: Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method.

Conclusion: New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of data processing to obtain a common format: deriving the harmonized Fasting Glucose DataSchema variable for two studies.
Figure 2
Figure 2
Data harmonization and federated infrastructure for three HOP studies.

References

    1. Smith-Warner SA, Spiegelman D, Ritz J, Albanes D, Beeson WL, Bernstein L, Berrino F, van den Brandt PA, Buring JE, Cho E. et al.Methods for pooling results of epidemiologic studies: the pooling project of prospective studies of diet and cancer. Am J Epidemiol. 2006;163(11):1053–1064. doi: 10.1093/aje/kwj127. - DOI - PubMed
    1. Thompson A. Thinking big: large-scale collaborative research in observational epidemiology. Eur J Epidemiol. 2009;24(12):727–731. doi: 10.1007/s10654-009-9412-1. - DOI - PubMed
    1. Khoury MJ. The case for a global human genome epidemiology initiative. Nat Genet. 2004;36(10):1027–1028. doi: 10.1038/ng1004-1027. - DOI - PubMed
    1. Hamilton CM, Strader LC, Pratt JG, Maiese D, Hendershot T, Kwok RK, Hammond JA, Huggins W, Jackman D, Pan H. et al.The PhenX toolkit: Get the most from your measures. Am J Epidemiol. 2011;174(3):253–260. doi: 10.1093/aje/kwr193. - DOI - PMC - PubMed
    1. Noale M, Minicuci N, Bardage C, Gindin J, Nikula S, Pluijm S, Rodríguez-Laso A, Maggi S. Predictors of mortality: an international comparison of socio-demographic and health characteristics from six longitudinal studies on aging: the CLESA project. Exp Gerontol. 2005;40(1):89–99. - PubMed