Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 15;29(4):643-651.
doi: 10.1093/jamia/ocab264.

The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics

Affiliations

The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics

Victor M Castro et al. J Am Med Inform Assoc. .

Abstract

Objective: Integrating and harmonizing disparate patient data sources into one consolidated data portal enables researchers to conduct analysis efficiently and effectively.

Materials and methods: We describe an implementation of Informatics for Integrating Biology and the Bedside (i2b2) to create the Mass General Brigham (MGB) Biobank Portal data repository. The repository integrates data from primary and curated data sources and is updated weekly. The data are made readily available to investigators in a data portal where they can easily construct and export customized datasets for analysis.

Results: As of July 2021, there are 125 645 consented patients enrolled in the MGB Biobank. 88 527 (70.5%) have a biospecimen, 55 121 (43.9%) have completed the health information survey, 43 552 (34.7%) have genomic data and 124 760 (99.3%) have EHR data. Twenty machine learning computed phenotypes are calculated on a weekly basis. There are currently 1220 active investigators who have run 58 793 patient queries and exported 10 257 analysis files.

Discussion: The Biobank Portal allows noninformatics researchers to conduct study feasibility by querying across many data sources and then extract data that are most useful to them for clinical studies. While institutions require substantial informatics resources to establish and maintain integrated data repositories, they yield significant research value to a wide range of investigators.

Conclusion: The Biobank Portal and other patient data portals that integrate complex and simple datasets enable diverse research use cases. i2b2 tools to implement these registries and make the data interoperable are open source and freely available.

Keywords: Information storage and retrieval; data curation; data science; electronic health records; genomics; i2b2.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The Biobank Portal architecture is based on Informatics for Integrating Biology and the Bedside (i2b2). Investigators access data through the webclient which interacts with the i2b2 application server using application programming interfaces (APIs). Most data are ingested into the data repository directly, but other data are accessed using external APIs at query time. PM: Project management cell; ONT: Ontology cell; CRC: Data repository cell; OMOP: Observational medical outcome partnership; CDM: common data model; VCF: variant call format; ETL: extract-transform-load.
Figure 2.
Figure 2.
Overview of Biobank Portal Data. Investigators see this screen at every login with information on available data, date of last update help, and quick start query examples.
Figure 3.
Figure 3.
Example analysis file specification to download limited datasets.

References

    1. Thiese MS. Observational and interventional study design types; an overview. Biochem Med (Zagreb) 2014; 24 (2): 199–210. - PMC - PubMed
    1. Gaziano JM, Concato J, Brophy M, et al.Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 2016; 70: 214–23. - PubMed
    1. All of Us Research Program Investigators. The “All of Us” Research Program. N Engl J Med 2019; 381: 668–76. - PMC - PubMed
    1. Bycroft C, Freeman C, Petkova D, et al.The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562 (7726): 203–9. - PMC - PubMed
    1. Oelsner EC, Allen NB, Ali T, et al. Collaborative Cohort of Cohorts for COVID-19 Research (C4R) Study: Study Design. medRxiv Published Online First: March 20, 2021. doi: 10.1101/2021.03.19.21253986. - PMC - PubMed

Publication types

LinkOut - more resources