Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 15;435(14):167994.
doi: 10.1016/j.jmb.2023.167994. Epub 2023 Feb 2.

RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances

Affiliations

RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances

Sebastian Bittrich et al. J Mol Biol. .

Abstract

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) provides open access to experimentally-determined three-dimensional (3D) structures of biomolecules. The RCSB PDB RCSB.org research-focused web portal is used annually by many millions of users around the world. They access biostructure information, run complex queries utilizing various search services (e.g., full-text, structural and chemical attribute, chemical, sequence, and structure similarity searches), and visualize macromolecules in 3D, all at no charge and with no limitations on data usage. Notwithstanding more than 24,000-fold growth of the PDB over the past five decades, experimentally-determined structures are only available for a small subset of the millions of proteins of known sequence. Recently developed machine learning software tools can predict 3D structures of proteins at accuracies comparable to lower-resolution experimental methods. The RCSB PDB now provides access to ∼1,000,000 Computed Structure Models (CSMs) of proteins coming from AlphaFold DB and the ModelArchive alongside ∼200,000 experimentally-determined PDB structures. Both CSMs and PDB structures are available on RCSB.org and via well-established RCSB PDB Data, Search, and 1D-Coordinates application programming interfaces (APIs). Simultaneous delivery of PDB data and CSMs provides users with access to complementary structural information across the human proteome and those of model organisms and selected pathogens. API enhancements are backwards-compatible and programmatic users can "opt in" to access CSMs with minimal effort. Herein, we describe modifications to RCSB PDB cyberinfrastructure required to support sixfold scaling of 3D biostructure data delivery and lay the groundwork for scaling to accommodate hundreds of millions of CSMs.

Keywords: FAIR principles; computer architecture; databases; protein structure prediction; structural biology.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1.
Figure 1.
Architectural overview of the RCSB.org web portal. Support for CSMs coming from AlphaFold DB and the ModelArchive was added in September 2022.
Figure 2.
Figure 2.
Data API query for CSM metadata on the left, response on the right. (a) Indicates whether this entry is a CSM or a PDB structure. (b) Provides the global pLDDT value. (c) Contains per-residue confidence values.
Figure 3.
Figure 3.
Search API query with predicate based on CSM metadata on the left, response on the right. (a) Text query. (b) Filters for very high pLDDT confidence values >90. (c) Requests CSMs.

References

    1. Burley SK, Bhikadiya C, Bi C, Bittrich S, Chen L, Crichlow GV, et al., (2022). RCSB Protein Data Bank: Celebrating 50 years of the PDB with new tools for understanding and visualizing biological macromolecules in 3D. Protein Sci. 31, 187–208. - PMC - PubMed
    1. Burley SK, Bhikadiya C, Bi C, Bittrich S, Chen L, Crichlow G, et al., (2021). RCSB Protein Data Bank: Powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering, and energy sciences. Nucleic Acids Res. 49, D437–D451. - PMC - PubMed
    1. Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, et al., (2023). RCSB Protein Data Bank (RCSB.org): Delivery of Experimentally-Determined PDB Structures Alongside One Million Computed Structure Models of Proteins from Artificial Intelligence/Machine Learning. Nucleic Acids Res. 51, D488–D508. - PMC - PubMed
    1. Protein Data Bank, (1971). Crystallography: Protein Data Bank. Nature (London), New Biol. 233, 223. - PubMed
    1. Rose Y, Duarte JM, Lowe R, Segura J, Bi C, Bhikadiya C, et al., (2021). RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive. J. Mol. Biol 433, 166704. - PMC - PubMed

Publication types

LinkOut - more resources