Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
- PMID: 31509880
- PMCID: PMC6739205
- DOI: 10.1055/s-0039-1695793
Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
Abstract
Background: High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks.
Objectives: To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task.
Methods: Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application.
Results: The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients.
Conclusion: A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source.
Georg Thieme Verlag KG Stuttgart · New York.
Conflict of interest statement
None declared.
Figures




References
-
- Debnath M, Prasad G BKS, Bisen P S.Molecular Diagnosis in the Post Genomic and Proteomic EraIn:Molecular Diagnostics: Promises and Possibilities Dordrecht Heidelberg London New York: Springer; 2010520
-
- Lin Y, Chen J, Shen B. Interactions between genetics, lifestyle, and environmental factors for healthcare. Adv Exp Med Biol. 2017;1005:167–191. - PubMed