. 2023 Dec 1;10(1):853.

doi: 10.1038/s41597-023-02778-9.

Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Affiliations

¹ Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. sria@ebi.ac.uk.
² Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
³ CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic.
⁴ Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000, Grenoble, France.

^# Contributed equally.

PMID: 38040737
PMCID: PMC10692154
DOI: 10.1038/s41597-023-02778-9

Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Sri Devan Appasamy et al. Sci Data. 2023.

. 2023 Dec 1;10(1):853.

doi: 10.1038/s41597-023-02778-9.

Authors

Affiliations

¹ Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. sria@ebi.ac.uk.
² Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
³ CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic.
⁴ Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000, Grenoble, France.

^# Contributed equally.

PMID: 38040737
PMCID: PMC10692154
DOI: 10.1038/s41597-023-02778-9

Abstract

Macromolecular complexes are essential functional units in nearly all cellular processes, and their atomic-level understanding is critical for elucidating and modulating molecular mechanisms. The Protein Data Bank (PDB) serves as the global repository for experimentally determined structures of macromolecules. Structural data in the PDB offer valuable insights into the dynamics, conformation, and functional states of biological assemblies. However, the current annotation practices lack standardised naming conventions for assemblies in the PDB, complicating the identification of instances representing the same assembly. In this study, we introduce a method leveraging resources external to PDB, such as the Complex Portal, UniProt and Gene Ontology, to describe assemblies and contextualise them within their biological settings accurately. Employing the proposed approach, we assigned standard names to over 90% of unique assemblies in the PDB and provided persistent identifiers for each assembly. This standardisation of assembly data enhances the PDB, facilitating a deeper understanding of macromolecular complexes. Furthermore, the data standardisation improves the PDB's FAIR attributes, fostering more effective basic and translational research and scientific education.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
Stable and transient biological complexes. Bacterial ribosomes (PDB:5WDT) and the human nucleosome (PDB:5AY8) are examples of stable macromolecular machines (panel a). The clathrin adaptor AP-2 complex (PDB:6OWT) and the calpain-calpastatin complex (PDB:3BOW) are examples of transient complexes (panel b).

**Fig. 2**
Assembly composition in the PDB. Protein-only assemblies dominate the macromolecular assemblies, and most proteins can be mapped to UniProt accessions. Example PDB entries from protein-only, protein-nucleic acid and nucleic acid-only assemblies include PDB entries 6bxa, 6dpo and 6c8m, respectively.

**Fig. 3**
Examples of homomeric assemblies with different stoichiometries in the PDB. We identified five main reasons for observing multiple stoichiometries for an assembly. These differences can be caused by experimental conditions (panel a), difficulties in automated assembly assignments (panel b), challenges in the curation and annotation process of assemblies (panel c/d), differences in the sample, for example in the sequence length (panel e) and genuine errors in curation (panel f).

**Fig. 4**
Frequency of cyclic and dihedral symmetries in the PDB. The PDB archive is dominated by cyclic c2 symmetry, with dihedral d2 symmetry and cyclic c3 symmetry being the second and third most frequent, respectively. The vertical axes in both plots are shown in logarithmic scale.

**Fig. 5**
Finding complexes of interest in PDBe. By integrating the unique complex identifiers and complex names into the search system of PDBe, researchers can find distinct complexes more consistently across the PDB instead of relying on searching by PDB entry titles or complex component names.

**Fig. 6**
Finding instances of the same assembly in PDBe. By clicking on the PDBe complex identifier, users can find all entries of an assembly with identical composition and species.

**Fig. 7**
Decision tree for automated assembly descriptions. Our process uses a decision tree to automatically generate descriptions for assembly components based on data from external data resources and component categories.

**Fig. 8**
Decision tree for automated assembly naming. Our process attempts to assign human-readable complex names to all the unique composition descriptions. The process relies on a manually curated list of complex names, and when unavailable, it will look for data from GO annotations and common component names.

See this image and copyright information in PMC

References

1. Ramakrishnan V. Ribosome Structure and the Mechanism of Translation. Cell. 2002;108:557–572. doi: 10.1016/S0092-8674(02)00619-0. - DOI - PubMed
1. Hahn, S. Structure and mechanism of the RNA polymerase II transcription machinery. Nat. Struct. Mol. Biol. 11, 394–403 (2004). - PMC - PubMed
1. Nooren IMA, Thornton JM. Diversity of protein–protein interactions. EMBO J. 2003;22:3486–3492. doi: 10.1093/emboj/cdg359. - DOI - PMC - PubMed
1. Acuner Ozbabacan SE, Engin HB, Gursoy A, Keskin O. Transient protein–protein interactions. Protein Eng. Des. Sel. 2011;24:635–648. doi: 10.1093/protein/gzr025. - DOI - PubMed
1. Raju RM, Goldberg AL, Rubin EJ. Bacterial proteolytic complexes as therapeutic targets. Nat. Rev. Drug Discov. 2012;11:777–789. doi: 10.1038/nrd3846. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Affiliations

Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources