Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 28;433(11):166704.
doi: 10.1016/j.jmb.2020.11.003. Epub 2020 Nov 10.

RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive

Affiliations

RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive

Yana Rose et al. J Mol Biol. .

Abstract

The US Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) serves many millions of unique users worldwide by delivering experimentally-determined 3D structures of biomolecules integrated with >40 external data resources via RCSB.org, application programming interfaces (APIs), and FTP downloads. Herein, we present the architectural redesign of RCSB PDB data delivery services that build on existing PDBx/mmCIF data schemas. New data access APIs (data.rcsb.org) enable efficient delivery of all PDB archive data. A novel GraphQL-based API provides flexible, declarative data retrieval along with a simple-to-use REST API. A powerful new search system (search.rcsb.org) seamlessly integrates heterogeneous types of searches across the PDB archive. Searches may combine text attributes, protein or nucleic acid sequences, small-molecule chemical descriptors, 3D macromolecular shapes, and sequence motifs. The new RCSB.org architecture adheres to the FAIR Principles, empowering users to address a wide array of research problems in fundamental biology, biomedicine, biotechnology, bioengineering, and bioenergy.

Keywords: FAIR principles; computer architecture; databases; structural biology.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1.
Figure 1.
Data management and delivery system underpinning the new RCSB architecture.
Figure 2.
Figure 2.
Schema usage by different components of the data management and delivery system.
Figure 3.
Figure 3.
Example search and data access queries: (a) query that combines text (1), sequence (2), structure shape (3), and chemical similarity (4) searches; (b) GraphQL API query including essential entry details (1–2), information details of the macromolecular entity data hierarchy (2–4) and small-molecules (5).

References

    1. Protein Data Bank, (1971). Crystallography: Protein Data Bank. Nature (London), New Biol, 233, 223.
    1. Berman HM, Henrick K, Nakamura H, (2003). Announcing the worldwide Protein Data Bank. Nature Struct. Biol, 10, 980. - PubMed
    1. wwPDB consortium, (2019). Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res, 47, D520–D528. - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. , (2000). The Protein Data Bank. Nucleic Acids Res, 28, 235–242. - PMC - PubMed
    1. Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, et al. , (2019). RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res, 47, D464–D474. - PMC - PubMed

Publication types

Substances

LinkOut - more resources