Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 1;10(1):853.
doi: 10.1038/s41597-023-02778-9.

Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Affiliations

Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Sri Devan Appasamy et al. Sci Data. .

Abstract

Macromolecular complexes are essential functional units in nearly all cellular processes, and their atomic-level understanding is critical for elucidating and modulating molecular mechanisms. The Protein Data Bank (PDB) serves as the global repository for experimentally determined structures of macromolecules. Structural data in the PDB offer valuable insights into the dynamics, conformation, and functional states of biological assemblies. However, the current annotation practices lack standardised naming conventions for assemblies in the PDB, complicating the identification of instances representing the same assembly. In this study, we introduce a method leveraging resources external to PDB, such as the Complex Portal, UniProt and Gene Ontology, to describe assemblies and contextualise them within their biological settings accurately. Employing the proposed approach, we assigned standard names to over 90% of unique assemblies in the PDB and provided persistent identifiers for each assembly. This standardisation of assembly data enhances the PDB, facilitating a deeper understanding of macromolecular complexes. Furthermore, the data standardisation improves the PDB's FAIR attributes, fostering more effective basic and translational research and scientific education.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Stable and transient biological complexes. Bacterial ribosomes (PDB:5WDT) and the human nucleosome (PDB:5AY8) are examples of stable macromolecular machines (panel a). The clathrin adaptor AP-2 complex (PDB:6OWT) and the calpain-calpastatin complex (PDB:3BOW) are examples of transient complexes (panel b).
Fig. 2
Fig. 2
Assembly composition in the PDB. Protein-only assemblies dominate the macromolecular assemblies, and most proteins can be mapped to UniProt accessions. Example PDB entries from protein-only, protein-nucleic acid and nucleic acid-only assemblies include PDB entries 6bxa, 6dpo and 6c8m, respectively.
Fig. 3
Fig. 3
Examples of homomeric assemblies with different stoichiometries in the PDB. We identified five main reasons for observing multiple stoichiometries for an assembly. These differences can be caused by experimental conditions (panel a), difficulties in automated assembly assignments (panel b), challenges in the curation and annotation process of assemblies (panel c/d), differences in the sample, for example in the sequence length (panel e) and genuine errors in curation (panel f).
Fig. 4
Fig. 4
Frequency of cyclic and dihedral symmetries in the PDB. The PDB archive is dominated by cyclic c2 symmetry,  with dihedral d2 symmetry and cyclic c3 symmetry being the second and third most frequent, respectively. The vertical axes in both plots are shown in logarithmic scale.
Fig. 5
Fig. 5
Finding complexes of interest in PDBe. By integrating the unique complex identifiers and complex names into the search system of PDBe, researchers can find distinct complexes more consistently across the PDB instead of relying on searching by PDB entry titles or complex component names.
Fig. 6
Fig. 6
Finding instances of the same assembly in PDBe. By clicking on the PDBe complex identifier, users can find all entries of an assembly with identical composition and species.
Fig. 7
Fig. 7
Decision tree for automated assembly descriptions. Our process uses a decision tree to automatically generate descriptions for assembly components based on data from external data resources and component categories.
Fig. 8
Fig. 8
Decision tree for automated assembly naming. Our process attempts to assign human-readable complex names to all the unique composition descriptions. The process relies on a manually curated list of complex names, and when unavailable, it will look for data from GO annotations and common component names.

References

    1. Ramakrishnan V. Ribosome Structure and the Mechanism of Translation. Cell. 2002;108:557–572. doi: 10.1016/S0092-8674(02)00619-0. - DOI - PubMed
    1. Hahn, S. Structure and mechanism of the RNA polymerase II transcription machinery. Nat. Struct. Mol. Biol. 11, 394–403 (2004). - PMC - PubMed
    1. Nooren IMA, Thornton JM. Diversity of protein–protein interactions. EMBO J. 2003;22:3486–3492. doi: 10.1093/emboj/cdg359. - DOI - PMC - PubMed
    1. Acuner Ozbabacan SE, Engin HB, Gursoy A, Keskin O. Transient protein–protein interactions. Protein Eng. Des. Sel. 2011;24:635–648. doi: 10.1093/protein/gzr025. - DOI - PubMed
    1. Raju RM, Goldberg AL, Rubin EJ. Bacterial proteolytic complexes as therapeutic targets. Nat. Rev. Drug Discov. 2012;11:777–789. doi: 10.1038/nrd3846. - DOI - PubMed

Substances