Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug;10(4):679-692.
doi: 10.1055/s-0039-1695793. Epub 2019 Sep 11.

Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC

Affiliations

Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC

Sebastian Mate et al. Appl Clin Inform. 2019 Aug.

Abstract

Background: High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks.

Objectives: To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task.

Methods: Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application.

Results: The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients.

Conclusion: A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Fig. 1
Fig. 1
The extract-transform-load (ETL) pipeline as configured for the ADOPT BBMRI-ERIC project, shown for two exemplary biobanks, one contributing an Entity-Attribute-Value (EAV) and the other a flat file. The files are extracted from the Biobank Information Management Systems (BIMS, left), processed by our tools into an XML file, and finally loaded into the OSSE system (right). CMF, central metadata definition file; LMF, local metadata definition file.
Fig. 2
Fig. 2
Illustration of the bag-of-words algorithm in MDRMatcher. After normalization and semantic expansion, it compares all n -grams from the source and the target item and computes a similarity score.
Fig. 3
Fig. 3
The MappingGUI program, which is used to curate mappings between source and target terms and values.
Fig. 4
Fig. 4
A four-axial classification scheme to assess the mapping quality of MDRMatcher.

References

    1. Debnath M, Prasad G BKS, Bisen P S.Molecular Diagnosis in the Post Genomic and Proteomic EraIn:Molecular Diagnostics: Promises and Possibilities Dordrecht Heidelberg London New York: Springer; 2010520
    1. Lin Y, Chen J, Shen B. Interactions between genetics, lifestyle, and environmental factors for healthcare. Adv Exp Med Biol. 2017;1005:167–191. - PubMed
    1. Futreal P A, Coin L, Marshall M et al.A census of human cancer genes. Nat Rev Cancer. 2004;4(03):177–183. - PMC - PubMed
    1. Reddy P H. Can diabetes be controlled by lifestyle activities? Curr Res Diabetes Obes J. 2017;1(04):x. - PMC - PubMed
    1. Yegambaram M, Manivannan B, Beach T G, Halden R U. Role of environmental contaminants in the etiology of Alzheimer's disease: a review. Curr Alzheimer Res. 2015;12(02):116–146. - PMC - PubMed

Publication types