Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Jan;1387(1):95-104.
doi: 10.1111/nyas.13272. Epub 2016 Nov 10.

Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets

Affiliations
Review

Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets

Bill McKinney et al. Ann N Y Acad Sci. 2017 Jan.

Abstract

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers.

Keywords: Data Access Alliance; Dataverse; RDMS; SBGrid; X-ray diffraction; research data management system.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Organized display of data collections at the SBDG. Shown is a (A) graphical view of laboratory and institutional collections within the SBDG; and (B) Protein Viewer (PV; https://biasmv.github.io/pv/), displaying a published model with links to its two primary deposited datasets.
Figure 2
Figure 2
Geographic distribution of DAA sites.
Figure 3
Figure 3
Experimental data flow and publication. Shown is a (A) flowchart for data publication and (B) flow of primary experimental data. Datasets collected at synchrotrons are moved to end-users’ computers for processing and structure determination. Subsequently refined macromolecular models are deposited at the PDB, and primary data are uploaded to the SBDG. From the SBDG, datasets are replicated to DAA centers and eventually copied to DAA satellites. End-users can access datasets by download from DAA centers and by direct access from satellites.
Figure 4
Figure 4
(A) Flowchart illustrating publication guidelines incorporating software and data citations. (B) Data citation guidelines, adapted from Dataverse Best Practices Guidelines, which were developed based on the Force 11 Joint Declaration of Data Citation Principles.

Similar articles

References

    1. Berman H, Henrick K, Nakamura H. Announcing the worldwide protein data bank. Nat Struct Biol. 2003;10:980–980. - PubMed
    1. Berman H, Kleywegt G, Nakamura H, et al. The protein data bank archive as an open data resource. J Comput Aided Mol Des. 2014;28:1009–1014. - PMC - PubMed
    1. Bilderback DH, Elleaume P, Weckert E. Review of third and next generation synchrotron light sources. J Phys B At Mol Opt Phys. 2005;38:S773–S797.
    1. Winn MD, Ballard CC, Cowtan, et al. Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr. 2011;67(Pt 4):235–242. - PMC - PubMed
    1. Adams PD, Afonine PV, Bunkoczi G, et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr. 2010;66(2):213–221. - PMC - PubMed

Publication types

LinkOut - more resources