Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Abstract

The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural and functional annotations of macromolecular structure data, contained in the Protein Data Bank (PDB). The goal of PDBe-KB is two-fold: (i) to increase the visibility and reduce the fragmentation of annotations contributed by specialist data resources, and to make these data more findable, accessible, interoperable and reusable (FAIR) and (ii) to place macromolecular structure data in their biological context, thus facilitating their use by the broader scientific community in fundamental and applied research. Here, we describe the guidelines of this collaborative effort, the current status of contributed data, and the PDBe-KB infrastructure, which includes the data exchange format, the deposition system for added value annotations, the distributable database containing the assembled data, and programmatic access endpoints. We also describe a series of novel web-pages-the PDBe-KB aggregated views of structure data-which combine information on macromolecular structures from many PDB entries. We have recently released the first set of pages in this series, which provide an overview of available structural and functional information for a protein of interest, referenced by a UniProtKB accession.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Traditionally, a PDB entry represents structures based on a single set of experiments, where each structure may represent only a segment of the full-length protein. However, PDB entries that describe the structure of the same protein are not interconnected. Furthermore, there is a rich ecosystem of resources and scientific software providing added value annotations based on the structures archived in the PDB, and when combined, these annotations provide evidence for the biological context of the protein. Therefore, the aim of PDBe-KB is to integrate these annotations and interconnect the various PDB entries in order to provide comprehensive, aggregated views of biologically meaningful entities, such as full-length proteins.
Figure 2.
Figure 2.
The infrastructure of PDBe-KB can be divided into data deposition and data access parts. Data deposition includes the data exchange format specification, the private FTP areas for depositors and the internal validation and processing pipeline hosted by PDBe. The data is integrated in a distributable graph database, and 50 public API endpoints serve data from it. These endpoints power all the reusable PDBe-KB web components. These web components are combined to create the aggregated views
Figure 3.
Figure 3.
The aggregated protein views are built using several web components, the main components being ProtVista and LiteMol. ProtVista (A) is a sequence feature viewer co-developed by UniProt, PDBe and InterPro. It can be used to display residue-level information mapped to sequences. Our implementation of LiteMol (B) is a lightweight molecular viewer wrapped into a reusable web component that allows the visualisation of biological assemblies, complexes and ligand binding sites.
Figure 4.
Figure 4.
The first two sections of the aggregated views of proteins provide an overview of all the data available in PDB for a protein of interest. The view includes the number of interacting small-molecules, macromolecular interaction partners and functional annotations, as well as the number of publications related to the PDBs, mapped to the protein (A). It also offers visual help for identifying all the PDB entries that cover various segments of the protein, as well as showing representative non-overlapping structures both as static images and interactive 3D viewer (B).
Figure 5.
Figure 5.
The aggregated protein views provide information on all the observed interactions between the protein of interest and ligands or macromolecular interaction partners. The gallery of ligands (A) and macromolecules (B) can be used to display their interactions in an interactive LiteMol instance, and to navigate to the corresponding PDBe and PDBe-KB entry pages to get more information about the partner molecules. Additionally, all the available functional annotations and biophysical parameters provided by PDBe-KB partner resources are being displayed using ProtVista (C).

References

    1. wwPDB consortium Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 2019; 47:D520–D528. - PMC - PubMed
    1. UniProt consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. - PMC - PubMed
    1. Gerstein M. Integrative database analysis in structural genomics. Nat. Struct. Biol. 2000; 7:960–963. - PubMed
    1. Lee D., Redfern O., Orengo C.. Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 2007; 8:995–1005. - PubMed
    1. Ribeiro A.M., Holliday G.L., Furnham N., Tyzack J.D., Ferris K., Thornton J.M.. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res. 2018; 46:D618–D623. - PMC - PubMed

Publication types