Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov;22(6):1271-6.
doi: 10.1093/jamia/ocv009. Epub 2015 Mar 21.

Virtualization of open-source secure web services to support data exchange in a pediatric critical care research network

Affiliations

Virtualization of open-source secure web services to support data exchange in a pediatric critical care research network

Lewis J Frey et al. J Am Med Inform Assoc. 2015 Nov.

Abstract

Objectives: To examine the feasibility of deploying a virtual web service for sharing data within a research network, and to evaluate the impact on data consistency and quality.

Material and methods: Virtual machines (VMs) encapsulated an open-source, semantically and syntactically interoperable secure web service infrastructure along with a shadow database. The VMs were deployed to 8 Collaborative Pediatric Critical Care Research Network Clinical Centers.

Results: Virtual web services could be deployed in hours. The interoperability of the web services reduced format misalignment from 56% to 1% and demonstrated that 99% of the data consistently transferred using the data dictionary and 1% needed human curation.

Conclusions: Use of virtualized open-source secure web service technology could enable direct electronic abstraction of data from hospital databases for research purposes.

Keywords: data governance; electronic health record; grid; learning health care system; pediatric critical care; pediatric network; secure web services; virtual machines; virtualization.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The picuGrid architecture was designed using a chaperoned Application Program Interface (API); firewall settings were controlled by the centers with picuGrid being instantiated between the external and internal firewall of the site, and local IT departments could set additional security restrictions to limit connections to the VM. Secure data transmission between the sites and the DCC was enforced through caGrid credentials within each VM that were validated by a third party credentialing service. Unlike traditional grid architecture, we limited the system so that only the DCC could access data and clinical sites and the other Clinical Centers could not view or access other sites. All data up to and including the shadow database were under the direct control of the local site personnel. The shadow database had a dictionary table for updating value sets for each site. The DCC could pull data using the chaperoned API but could not access the shadow database directly. The solid arrow shows data pulled from the administration database to a comma separated values (CSV) file and then pushed past the internal firewall and into the picuGrid shadow database. Many clinical research studies use data from the active EHR or the enterprise data warehouse (EDW). Pulling data such as laboratory test results or vital signs would be beneficial to most of the network clinical studies. The dotted lines from the EHR and EDW represent those desired future data sources. Since each site has a MySQL database, the training needed to access the data is the standard querying of databases (ie, Structured Query Language, SQL). Each site received a user guide to facilitate installation and support of the system.
Figure 2:
Figure 2:
(A) A field was defined as being misaligned if at least 1 row had incorrectly formatted values. Format misalignments were measured as the percentage of incorrectly formatted fields out of a total of 54 fields specified by the research protocol. (B) We assisted sites to load their 2012 data into the picuGrid system. If the row of 2012 data loaded with no ETL process content errors, then the record was counted as “Reused ETL: Low Curation.” If the dictionary table in the picuGrid shadow database needed to be updated to account for a new value, then the record was counted as “Dictionary: Moderate Curation.” If a human needed to clarify and potentially change the data in the data file, then the record was labeled as “Clarification: High Curation.” We assisted 1 site in reconfiguring their ETL process. This change was necessary due to a field being conjoined from 2 fields in the 2012 data set. The fields were separated and loaded through a simple change to the ETL process.
Figure 3:
Figure 3:
The red line is the time for whole table queries to return for 1 to 6 virtual clients requesting data from 1 virtual server with each client requesting 4000 rows of data. The time increases linearly with the number of clients. The blue line consists of 1, 2, or 3 pairs of virtual clients requesting 8000 rows of data from 2 virtual servers. The green line consists of 1 or 2 virtual client triplets requesting 12 000 rows of data from 3 virtual servers. There is an initial cost to establishing the caGrid connection of around 50 seconds, which is a time lag that is more acceptable for picuGrid’s batch architecture instead of a real-time system.

Similar articles

Cited by

  • Case Study: Semantic Annotation of a Pediatric Critical Care Research Study.
    Sward KA, Rubin S, Jenkins TL, Newth CJ, Dean JM; Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Collaborative Pediatric Critical Care Research Network (CPCCRN). Sward KA, et al. Comput Inform Nurs. 2016 Mar;34(3):101-4. doi: 10.1097/CIN.0000000000000236. Comput Inform Nurs. 2016. PMID: 26958992 Free PMC article. No abstract available.
  • Precision medicine informatics.
    Frey LJ, Bernstam EV, Denny JC. Frey LJ, et al. J Am Med Inform Assoc. 2016 Jul;23(4):668-70. doi: 10.1093/jamia/ocw053. Epub 2016 Jun 6. J Am Med Inform Assoc. 2016. PMID: 27274018 Free PMC article. No abstract available.
  • Data integration strategies for predictive analytics in precision medicine.
    Frey LJ. Frey LJ. Per Med. 2018 Nov;15(6):543-551. doi: 10.2217/pme-2018-0035. Epub 2018 Nov 2. Per Med. 2018. PMID: 30387695 Free PMC article. Review.
  • Pediatric Multiple Organ Dysfunction Syndrome: Promising Therapies.
    Doctor A, Zimmerman J, Agus M, Rajasekaran S, Bubeck Wardenburg J, Fortenberry J, Zajicek A, Mairson E, Typpo K. Doctor A, et al. Pediatr Crit Care Med. 2017 Mar;18(3_suppl Suppl 1):S67-S82. doi: 10.1097/PCC.0000000000001053. Pediatr Crit Care Med. 2017. PMID: 28248836 Free PMC article. Review.

References

    1. Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2010;2(57):57cm29. - PubMed
    1. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network . J Am Med Inform Assoc. 2014;21:578–582. - PMC - PubMed
    1. Garde S, Knaup P, Hovenga EJ, Heard S. Towards semantic interoperability for electronic health records–domain knowledge governance for open EHR archetypes. Methods Inf Med. 2007;46(3):332–343. - PubMed
    1. Wollersheim D, Sari A, Rahayu W. Archetype-based electronic health records: a literature review and evaluation of their applicability to health data interoperability and access. Health Inf Manag J. 2009;38(2):7–17. - PubMed
    1. Adams L. Stewardship and governance in the learning health system. In: Institute of Medicine. Grossmann C, Powers B, McGinnis JM, eds. Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care: Workshop Series Summary. Washington, DC: National Academies Press; 2011. - PubMed

Publication types