Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data
- PMID: 37045837
- PMCID: PMC10097656
- DOI: 10.1038/s41597-023-02101-6
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data
Abstract
More than 61,000 proteins have up-to-date correspondence between their amino acid sequence (UniProtKB) and their 3D structures (PDB), enabled by the Structure Integration with Function, Taxonomy and Sequences (SIFTS) resource. SIFTS incorporates residue-level annotations from many other biological resources. SIFTS data is available in various formats like XML, CSV and TSV format or also accessible via the PDBe REST API but always maintained separately from the structure data (PDBx/mmCIF file) in the PDB archive. Here, we extended the wwPDB PDBx/mmCIF data dictionary with additional categories to accommodate SIFTS data and added the UniProtKB, Pfam, SCOP2, and CATH residue-level annotations directly into the PDBx/mmCIF files from the PDB archive. With the integrated UniProtKB annotations, these files now provide consistent numbering of residues in different PDB entries allowing easy comparison of structure models. The extended dictionary yields a more consistent, standardised metadata description without altering the core PDB information. This development enables up-to-date cross-reference information at the residue level resulting in better data interoperability, supporting improved data analysis and visualisation.
© 2023. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures








Similar articles
-
PDB NextGen Archive: centralizing access to integrated annotations and enriched structural information by the Worldwide Protein Data Bank.Database (Oxford). 2024 May 27;2024:baae041. doi: 10.1093/database/baae041. Database (Oxford). 2024. PMID: 38803272 Free PMC article.
-
SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins.Nucleic Acids Res. 2019 Jan 8;47(D1):D482-D489. doi: 10.1093/nar/gky1114. Nucleic Acids Res. 2019. PMID: 30445541 Free PMC article.
-
SIFTS: Structure Integration with Function, Taxonomy and Sequences resource.Nucleic Acids Res. 2013 Jan;41(Database issue):D483-9. doi: 10.1093/nar/gks1258. Epub 2012 Nov 29. Nucleic Acids Res. 2013. PMID: 23203869 Free PMC article.
-
Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive.Methods Mol Biol. 2017;1607:627-641. doi: 10.1007/978-1-4939-7000-1_26. Methods Mol Biol. 2017. PMID: 28573592 Free PMC article. Review.
-
The Protein Data Bank Archive.Methods Mol Biol. 2021;2305:3-21. doi: 10.1007/978-1-0716-1406-8_1. Methods Mol Biol. 2021. PMID: 33950382 Review.
Cited by
-
Functional (re)annotation of Mycobacteroides abscessus proteome using integrative sequence and AI-based structural approaches.Curr Res Struct Biol. 2025 Aug 6;10:100172. doi: 10.1016/j.crstbi.2025.100172. eCollection 2025 Dec. Curr Res Struct Biol. 2025. PMID: 40895204 Free PMC article.
-
PDB NextGen Archive: centralizing access to integrated annotations and enriched structural information by the Worldwide Protein Data Bank.Database (Oxford). 2024 May 27;2024:baae041. doi: 10.1093/database/baae041. Database (Oxford). 2024. PMID: 38803272 Free PMC article.
-
Machine Learning Models to Interrogate Proteomewide Covalent Ligandabilities Directed at Cysteines.bioRxiv [Preprint]. 2024 Jan 7:2023.08.17.553742. doi: 10.1101/2023.08.17.553742. bioRxiv. 2024. Update in: JACS Au. 2024 Apr 05;4(4):1374-1384. doi: 10.1021/jacsau.3c00749. PMID: 37662346 Free PMC article. Updated. Preprint.
-
Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature.Sci Data. 2024 Sep 27;11(1):1032. doi: 10.1038/s41597-024-03841-9. Sci Data. 2024. PMID: 39333508 Free PMC article.
-
Machine Learning Models to Interrogate Proteome-Wide Covalent Ligandabilities Directed at Cysteines.JACS Au. 2024 Apr 5;4(4):1374-1384. doi: 10.1021/jacsau.3c00749. eCollection 2024 Apr 22. JACS Au. 2024. PMID: 38665640 Free PMC article.
References
Grants and funding
- BB/V004247/1, PI:Sameer Velankar/RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
- BB/V004247/1, PI:Sameer Velankar/RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
- BB/V004247/1, PI:Sameer Velankar/RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
- BB/V004247/1, PI:Sameer Velankar/RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
- BB/V004247/1, PI:Sameer Velankar/RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
- BB/V004247/1, PI:Sameer Velankar/RCUK | Biotechnology and Biological Sciences Research Council (BBSRC)
- DBI-2019297, PI: S.K. Burley/National Science Foundation (NSF)
- DBI-2019297, PI: S.K. Burley/National Science Foundation (NSF)
- DBI-2019297, PI: S.K. Burley)/National Science Foundation (NSF)
- DBI-2019297, PI: S.K. Burley/National Science Foundation (NSF)
- DBI-2019297, PI: S.K. Burley/National Science Foundation (NSF)
- DBI-2019297, PI: S.K. Burley/NSF | National Science Board (NSB)
LinkOut - more resources
Full Text Sources