Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 15;29(4):592-600.
doi: 10.1093/jamia/ocab278.

Migrating a research data warehouse to a public cloud: challenges and opportunities

Affiliations

Migrating a research data warehouse to a public cloud: challenges and opportunities

Michael G Kahn et al. J Am Med Inform Assoc. .

Abstract

Objective: Clinical research data warehouses (RDWs) linked to genomic pipelines and open data archives are being created to support innovative, complex data-driven discoveries. The computing and storage needs of these research environments may quickly exceed the capacity of on-premises systems. New RDWs are migrating to cloud platforms for the scalability and flexibility needed to meet these challenges. We describe our experience in migrating a multi-institutional RDW to a public cloud.

Materials and methods: This study is descriptive. Primary materials included internal and public presentations before and after the transition, analysis documents, and actual billing records. Findings were aggregated into topical categories.

Results: Eight categories of migration issues were identified. Unanticipated challenges included legacy system limitations; network, computing, and storage architectures that realize performance and cost benefits in the face of hyper-innovation, complex security reviews and approvals, and limited cloud consulting expertise.

Discussion: Cloud architectures enable previously unavailable capabilities, but numerous pitfalls can impede realizing the full benefits of a cloud environment. Rapid changes in cloud capabilities can quickly obsolete existing architectures and associated institutional policies. Touchpoints with on-premise networks and systems can add unforeseen complexity. Governance, resource management, and cost oversight are critical to allow rapid innovation while minimizing wasted resources and unnecessary costs.

Conclusions: Migrating our RDW to the cloud has enabled capabilities and innovations that would not have been possible with an on-premises environment. Notwithstanding the challenges of managing cloud resources, the resulting RDW capabilities have been highly positive to our institution, research community, and partners.

Keywords: big data; cloud computing; data warehousing; research data governance.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Key findings from 2016 pilot studies comparing Google Cloud Platform with existing on-premises systems as presented to nontechnical executive sponsors. Superlative were used to emphasize particularly distinctive findings that supported the migration proposal.
Figure 2.
Figure 2.
Top, The executive view of Health Data Compass highlighting data inputs, outputs and key GCP technologies for nontechnical audiences. Bottom, Technical view of data flows, network boundaries, and internal GCP technologies used in the current Health Data Compass research data warehouse. Google Cloud icons labels available at https://docs.google.com/presentation/d/1aGOTpNdCoO4GXZ2es38ZFO5qPGEAjTtDSVeHaDpwsas/edit#slide=id.g5e923c6224_190_56. Abbreviations: APCD: Colorado All Payers Claims Database; CDPHE: State death registry; GCP: Google Cloud Platform; Melissa: Melissa Inc.
Figure 3.
Figure 3.
Data flows and key Google Cloud Platform (GCP) technologies used by the Translational Informatics Service (TIS). Although TIS uses fewer GCP technologies, TIS deploys more “forward-facing” (App Engine GUI, R Studio), high-performance computing (Eureka HPC), and cloud storage resources than does the RDW.
Figure 4.
Figure 4.
Top, Growth in Google Cloud Platform (GCP) total spend across all GCP services from July 2017. Middle, Growth of GCP monthly costs by specific GCP service October 2020–March 2021. Bottom, Proportion of charges across GCP services January–March 2021.

References

    1. Kohane IS. Ten things we have to do to achieve precision medicine. Science 2015; 349 (6243): 37–8. - PubMed
    1. Campion TR, Craven CK, Dorr DA, Knosp BM.. Understanding enterprise data warehouses to support clinical and translational research. J Am Med Inform Assoc 2020; 27 (9): 1352–8. - PMC - PubMed
    1. National Research Council. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease [Internet]. Washington, DC: National Academies Press; 2011. 10.17226/13284. Accessed December 11, 2021. - DOI - PubMed
    1. Choi IY, Kim T-M, Kim MS, Mun SK, Chung Y-J.. Perspectives on clinical informatics: integrating large-scale clinical, genomic, and health information for clinical care. Genomics Inform 2013; 11 (4): 186–90. - PMC - PubMed
    1. Wade TD. Traits and types of health data repositories. Health Inf Sci Syst 2014; 2 (1): 4. - PMC - PubMed

Publication types