Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Aug 27;4(8):e17687.
doi: 10.2196/17687.

What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions

Affiliations
Review

What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions

Kristina K Gagalova et al. JMIR Form Res. .

Abstract

Background: Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade.

Objective: The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation.

Methods: We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices.

Results: Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making).

Conclusions: IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.

Keywords: data aggregation; data analytics; data warehousing; database; health informatics; information storage and retrieval.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Article selection process. The diagram shows the number of articles at each stage of selection for each of the 3 databases: MEDLINE (Medical Literature Analysis and Retrieval System Online), IEEE Xplore (Institute of Electrical and Electronics Engineers Xplore), and Google Scholar.
Figure 2
Figure 2
Architecture models identified from selected integrated data repositories (IDRs). Arrows indicate data output because of a query (blue) and data input (orange) because of data integration or update. Continuous lines show data query and integration applied by research users, whereas dashed lines are data queries performed by operational or clinical users.
Figure 3
Figure 3
Common data types across IDRs. Columns show the main types of data collected in the selected IDRs. Gray-filled cells denote feature presence, with colors classifying the IDRs based on the examined architectures. Only 19 IDR articles contained enough information in their articles to be included in this figure. BRP: biorepository portal; BTRIS: biomedical translational research information system; CARPEM: cancer research for personalized medicine; CLB-IT: Léon Bérard Cancer Center Information Technology; DW4TR: Data Warehouse for Translational Research; EHR: electronic health record; HEGP: Hôpital Européen Georges Pompidou; HERON: health care enterprise repository for ontological narration; HSSC: Health Science, South Carolina; IDRs: integrated data repositories; Mayo Clinic-TRC: Mayo Clinic – Translational Research Center; METEOR: Methodist Environment for Translational Enhancement and Outcome Research; MIDH: Maternal and Infant Data Hub; MOSAIC: models and simulation techniques for discovering diabetes-related factors; Onco-i2b2; PHIS+: Pediatric Health Information System+; STARR: STAnford Research Repository; VUMC-BioVU: Vanderbilt University Medical Center–BioVU; VUMC-SD: Vanderbilt University Medical Center–Synthetic Derivative.

Similar articles

Cited by

References

    1. Adler-Milstein J, Holmgren AJ, Kralovec P, Worzala C, Searcy T, Patel V. Electronic health record adoption in US hospitals: the emergence of a digital 'advanced use' divide. J Am Med Inform Assoc. 2017 Nov 1;24(6):1142–8. doi: 10.1093/jamia/ocx080. - DOI - PMC - PubMed
    1. Lau F, Price M, Boyd J, Partridge C, Bell H, Raworth R. Impact of electronic medical record on physician practice in office settings: a systematic review. BMC Med Inform Decis Mak. 2012 Feb 24;12:10. doi: 10.1186/1472-6947-12-10. https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-694... - DOI - DOI - PMC - PubMed
    1. Schoen C, Osborn R, Doty MM, Squires D, Peugh J, Applebaum S. A survey of primary care physicians in eleven countries, 2009: perspectives on care, costs, and experiences. Health Aff (Millwood) 2009;28(6):w1171–83. doi: 10.1377/hlthaff.28.6.w1171. - DOI - PubMed
    1. MacKenzie SL, Wyatt MC, Schuff R, Tenenbaum JD, Anderson N. Practices and perspectives on building integrated data repositories: results from a 2010 CTSA survey. J Am Med Inform Assoc. 2012 Jun;19(e1):e119–24. doi: 10.1136/amiajnl-2011-000508. http://europepmc.org/abstract/MED/22437072 - DOI - PMC - PubMed
    1. Anderson N, Abend A, Mandel A, Geraghty E, Gabriel D, Wynden R, Kamerick M, Anderson K, Rainwater J, Tarczy-Hornoch P. Implementation of a deidentified federated data network for population-based cohort discovery. J Am Med Inform Assoc. 2012 Jun;19(e1):e60–7. doi: 10.1136/amiajnl-2011-000133. http://europepmc.org/abstract/MED/21873473 - DOI - PMC - PubMed