Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 6;51(D1):D488-D508.
doi: 10.1093/nar/gkac1077.

RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning

Affiliations

RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning

Stephen K Burley et al. Nucleic Acids Res. .

Abstract

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a 'living data resource.' Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
RCSB.org now delivers ∼200 000 experimentally-determined PDB structures alongside >1M Computed Structure Models that can all be searched, analyzed, visualized, and explored using custom tools and features.
Figure 1.
Figure 1.
Cladogram showing PDB holdings for proteins from each kingdom of life (as of mid-2022). Within each branch, PDB structure totals are provided for selected organisms. Adapted from Figure 7 in (30). (N.B.: The PDB also houses 3D structures that solely contain nucleic acids, viral proteins, or designed proteins, which in aggregate accounted for ∼8% of archival holdings as of mid-2022.)
Figure 2.
Figure 2.
Within RCSB.org, an Erlenmeyer flask icon on a dark-blue background is used to denote experimentally-determined PDB structures (left) and a computer screen icon on a cyan background denotes CSMs (right).
Figure 3.
Figure 3.
Search options at RCSB.org include Top Bar or Basic Search; Advanced Search; and Browse Annotations.
Figure 4.
Figure 4.
Top Bar or Basic Search options available from every RCSB.org web page. Examples of searching for 3D structures using (A) simple text string insulin receptor; (B) drop down autosuggestions based on the text string insulin receptor; (C) Boolean operators to combine insulin + receptor (+ = AND); or (D) an amino acid sequence. (E) Searching RCSB.org documentation using a text string biological assembly.
Figure 5.
Figure 5.
Structure Summary Page for PDB ID 1b54. (A) Overview. (B) Literature. (C) Macromolecules. (D) Small molecules. (E) Experimental Data & Validation. (F) 1D–3D Viewer.
Figure 6.
Figure 6.
Structure Summary Page for the AlphaFold DB CSM AF_AFO94903F1. (A) Overview (including Model Confidence). (B) Macromolecules. (C) 1D–3D View launched from the Structure Summary Page.
Figure 7.
Figure 7.
Pairwise superposition of CSM ID AF_AFO94903F1 and PDB ID 1b54. (A) Pair of aligned structures, with both polypeptide chains rendered using ribbon representations. Aligned portions of the PDB structure and CSM are color-coded blue and brown, respectively. Dashed blue lines represent parts of the polypeptide chain not resolved in the X-ray crystallographic experiment. Portions of the PDB structure and CSM that could not be aligned are color-coded gray and cream, respectively. PLP is shown in magenta ball-and-stick, and water molecules are shown as gray spheres. Inset is a closeup of the amino acid residues within 5 Å of the ligand in both the PDB structure and CSM. (B) Same view as in A-inset but showing the amino acid side chains from PDB ID 1b54 that interact with PLP. (C) Same view as in A-inset but showing amino acids from the CSM corresponding to the residues shown in panel (B). Conserved amino acids shown in panels (B) and (C) are identified in bold font. Atom colored coding: C-light blue, brown or magenta; N-dark blue; O-red; S-yellow. Dotted blue lines denote hydrogen bonds and charge–dipole interactions.
Figure 8.
Figure 8.
Query by Example options on Structure Summary Pages for PDB structures.
Figure 9.
Figure 9.
Query by Example options on Structure Summary Pages for CSMs.
Figure 10.
Figure 10.
Using RCSB.org Advanced Search to construct complex Boolean queries and modify Results options.

References

    1. Protein Data Bank Crystallography: protein data bank. Nature. 1971; 233:223–223.
    1. Moore P.B. The PDB and the ribosome. J. Biol. Chem. 2021; 296:100561. - PMC - PubMed
    1. Johnson J.E., Olson A.J.. Icosahedral virus structures and the protein data bank. J. Biol. Chem. 2021; 296:100554. - PMC - PubMed
    1. Neidle S. Beyond the double helix: DNA structural diversity and the PDB. J. Biol. Chem. 2021; 296:100553. - PMC - PubMed
    1. Westhof E., Leontis N.B.. 2021) An RNA-centric historical narrative around the protein data bank. J. Biol. Chem. 296:100555. - PMC - PubMed

Publication types