Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 7:7:10882.
doi: 10.1038/ncomms10882.

Data publication with the structural biology data grid supports live analysis

Peter A Meyer  1 Stephanie Socias  1 Jason Key  1 Elizabeth Ransey  1 Emily C Tjon  1 Alejandro Buschiazzo  2   3 Ming Lei  4 Chris Botka  5 James Withrow  6 David Neau  6 Kanagalaghatta Rajashankar  6 Karen S Anderson  7 Richard H Baxter  8 Stephen C Blacklow  1 Titus J Boggon  7 Alexandre M J J Bonvin  9 Dominika Borek  10 Tom J Brett  11 Amedeo Caflisch  12 Chung-I Chang  13 Walter J Chazin  14 Kevin D Corbett  15   16 Michael S Cosgrove  17 Sean Crosson  18 Sirano Dhe-Paganon  19 Enrico Di Cera  20 Catherine L Drennan  21 Michael J Eck  1   19 Brandt F Eichman  22 Qing R Fan  23 Adrian R Ferré-D'Amaré  24 J Christopher Fromme  25 K Christopher Garcia  26   27   28 Rachelle Gaudet  29 Peng Gong  30 Stephen C Harrison  1   31   32 Ekaterina E Heldwein  33 Zongchao Jia  34 Robert J Keenan  18 Andrew C Kruse  1 Marc Kvansakul  35 Jason S McLellan  36 Yorgo Modis  37 Yunsun Nam  38 Zbyszek Otwinowski  10 Emil F Pai  39   40 Pedro José Barbosa Pereira  41 Carlo Petosa  42 C S Raman  43 Tom A Rapoport  44 Antonina Roll-Mecak  45   46 Michael K Rosen  47 Gabby Rudenko  48 Joseph Schlessinger  49 Thomas U Schwartz  50 Yousif Shamoo  51 Holger Sondermann  52 Yizhi J Tao  51 Niraj H Tolia  53 Oleg V Tsodikov  54 Kenneth D Westover  55 Hao Wu  1   56 Ian Foster  57 James S Fraser  58 Filipe R N C Maia  59   60 Tamir Gonen  61 Tom Kirchhausen  62   63 Kay Diederichs  64 Mercè Crosas  65 Piotr Sliz  1
Affiliations

Data publication with the structural biology data grid supports live analysis

Peter A Meyer et al. Nat Commun. .

Abstract

Access to experimental X-ray diffraction image data is fundamental for validation and reproduction of macromolecular models and indispensable for development of structural biology processing methods. Here, we established a diffraction data publication and dissemination system, Structural Biology Data Grid (SBDG; data.sbgrid.org), to preserve primary experimental data sets that support scientific publications. Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of the original published structures. SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets. It is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Data collection statistics for the pilot subset of 112 data sets.
(a,b) Data sets were collected from synchrotrons on four continents (in addition to laboratory sources, which are not broken down geographically) and originate from eleven synchrotron facilities: Advanced Light Source, Advanced Photon Source, Australian Synchrotron, Cornell High Energy Synchrotron Source, Canadian Light Source, European Synchrotron Radiation Facility, National Synchrotron Light Source, National Synchrotron Radiation Research Center, Swiss Light Source, Shanghai Synchrotron Radiation Facility, and Stanford Synchrotron Radiation Lightsource. World map image courtesy of the U.S. Geological Survey. (c) Breakdown of data sets collected at the Advanced Photon Source beamlines. (d) Data sets cover a range of detector types, including Area Detector Systems Corporation M300, Q210 and Q315, Rayonix MarMosaic, Dectris Pilatus 2M and 6M, R-AXIS HTC, and MAR345.
Figure 2
Figure 2. Estimation of storage requirements for different stages of the structural biology pipeline, based on the SBDG pilot collection.
For structure factor amplitudes and PDB models file sizes were obtained from a subset of 96 PDB depositions derived from the pilot data sets. On average, SBDG stores 1.26 data sets per PDB file. Numbers in red indicated the estimated storage requirements to accommodate data sets for 100,000 structures. We estimate that for each primary data set, additional 100 data sets are collected at national facilities. Primary data refers to original experimental diffraction images supporting the derived structural model, as distinguished from all experimental data (screening images, inferior quality data sets, and so on). For crystallographic experiments, reduced data refers to the integrated intensities (or amplitudes, which do not materially affect storage requirements).
Figure 3
Figure 3. Organized display of data collections at SBDG.
(a) Graphical view of Laboratory and Institutional Collections within the SBDG; (b) PV structure viewer, displaying a published model with links to its two primary deposited data sets.
Figure 4
Figure 4. SBDG persistent data set landing page (the target of a DOI resolver for a published data set).
Data set metadata are displayed, as are instructions for downloading and verifying the data set.
Figure 5
Figure 5. Experimental data flow and publication.
(a) Flow of Primary Experimental Data. Data sets collected at synchrotrons are moved to end-users' computers for processing and structure determination. Subsequently refined macromolecular models are deposited at PDB and primary data is uploaded to SBDG. From SBDG, data sets are replicated to DAA centres and eventually copied to DAA Satellites. End-users can access data sets by downloading from DAA centres and by direct access from Satellites. (b) Flowchart for data publication.
Figure 6
Figure 6. DataCite metadata schema used for primary data sets within the SBDG.
Information associated with the DOI record for a primary data set through the EZID system.
Figure 7
Figure 7. Data publication guidelines.
(a) Flowchart illustrating publication guidelines incorporating software and data citations. (b) Data Citation guidelines, adapted from Dataverse Best Practices Guidelines that were developed based on Force 11 Joint Declaration of Data Citation Principles.
Figure 8
Figure 8. Reprocessing of X-ray diffraction data sets.
(a) Analysis of 110 X-ray diffraction data sets that supported previously published PDB coordinates. Most of the failures (represented in red) were due to inaccurate or incomplete image-header information. In several of these cases, depositors provided annotations correcting this information; (b) Comparison of resolution determined by automated xia2 reprocessing with published resolution. Includes data sets not used for final refinement of published structures; (c) Shift in direct beam position from image headers and refined value following successful reprocessing with xia2.

References

    1. Bilderback D. H., Elleaume P. & Weckert E. Review of third and next generation synchrotron light sources. J. Phys. B: At. Mol. Opt. Phys. 38, S773–S797 (2005).
    1. Guss J. M. & McMahon B. How to make deposition of images a reality. Acta Crystallogr. D Biol. Crystallogr. 70, 2520–2532 (2014). - PMC - PubMed
    1. Meyer G. R. et al. Operation of the Australian Store.Synchrotron for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 70, 2510–2519 (2014). - PMC - PubMed
    1. Elsliger M.-A. et al. The JCSG high-throughput structural biology pipeline. Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 66, 1137–1142 (2010). - PMC - PubMed
    1. Kroon-Batenburg L. M. J. & Helliwell J. R. Experiences with making diffraction image data available: what metadata do we need to archive? Acta Crystallogr. D Biol. Crystallogr. 70, 2502–2509 (2014). - PMC - PubMed

Publication types

Substances