Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr;45(2):408-416.
doi: 10.1093/ije/dyv193. Epub 2015 Oct 8.

ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data

Affiliations

ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data

Kim W Carter et al. Int J Epidemiol. 2016 Apr.

Abstract

Background: Research studies exploring the determinants of disease require sufficient statistical power to detect meaningful effects. Sample size is often increased through centralized pooling of disparately located datasets, though ethical, privacy and data ownership issues can often hamper this process. Methods that facilitate the sharing of research data that are sympathetic with these issues and which allow flexible and detailed statistical analyses are therefore in critical need. We have created a software platform for the Virtual Pooling and Analysis of Research data (ViPAR), which employs free and open source methods to provide researchers with a web-based platform to analyse datasets housed in disparate locations.

Methods: Database federation permits controlled access to remotely located datasets from a central location. The Secure Shell protocol allows data to be securely exchanged between devices over an insecure network. ViPAR combines these free technologies into a solution that facilitates 'virtual pooling' where data can be temporarily pooled into computer memory and made available for analysis without the need for permanent central storage.

Results: Within the ViPAR infrastructure, remote sites manage their own harmonized research dataset in a database hosted at their site, while a central server hosts the data federation component and a secure analysis portal. When an analysis is initiated, requested data are retrieved from each remote site and virtually pooled at the central site. The data are then analysed by statistical software and, on completion, results of the analysis are returned to the user and the virtually pooled data are removed from memory.

Conclusions: ViPAR is a secure, flexible and powerful analysis platform built on open source technology that is currently in use by large international consortia, and is made publicly available at [http://bioinformatics.childhealthresearch.org.au/software/vipar/].

Keywords: ViPAR; data federation; data pooling; data sharing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
ViPAR topology. A typical multi-site ViPAR configuration where a ViPAR master server (VMS) is linked to a number of remote sites. Each remote site stores and maintains their research data. Users of the ViPAR system access the web-based analytical portal where they can initiate analyses. During an analysis, the federation component retrieves data from the remote sites into RAM on the VMS where they are analysed and removed without ever permanently being stored.
Figure 2.
Figure 2.
VWAP analysis interface. Screenshot of browsing the VWAP analysis interface. Here the analyst has provided some simple syntax in the R language to provide summary information for the single selected variable across all selected resources.
Figure 3.
Figure 3.
VWAP file manager. Screenshot of browsing the VWAP file manager. Here the output files resulting from a single analysis are displayed. Users can download files individually or all at once in the provided ZIP file. Optionally users can upload files to associate with an analysis. In addition there are options for deleting the results of an analysis and for sharing the results with other users of the system.

References

    1. Boulton G, Rawlins M, Vallance P, Walport M . Science as a public enterprise: the case for open data . Lancet 2011. ; 377 : 1633 – 35 . - PubMed
    1. Ross JS, Krumholz HM . Ushering in a new era of open science through data sharing: the wall must come down . JAMA 2013. ; 309 : 1355 – 56 . - PubMed
    1. Walport M, Brest P . Sharing research data to improve public health . Lancet 2011. ; 377 : 537 – 39 . - PubMed
    1. Glass GV . Primary, secondary, and meta-analysis of research . Educational Researcher 1976. ; 5 : 3 – 8 .
    1. Haas LM, Lin ET, Roth MA . Data integration through database federation . IBM Syst J 2002. ; 41 : 578 – 96 .