. 2014 Nov 25:2014:bau116.

doi: 10.1093/database/bau116. Print 2014.

Small molecule annotation for the Protein Data Bank

Sanchayita Sen¹, Jasmine Young², John M Berrisford², Minyu Chen², Matthew J Conroy², Shuchismita Dutta², Luigi Di Costanzo², Guanghua Gao², Sutapa Ghosh², Brian P Hudson², Reiko Igarashi², Yumiko Kengaku², Yuhe Liang², Ezra Peisach², Irina Persikova², Abhik Mukhopadhyay², Buvaneswari Coimbatore Narayanan², Gaurav Sahni², Junko Sato², Monica Sekharan², Chenghua Shao², Lihua Tan², Marina A Zhuravleva²

Affiliations

¹ Protein Data Bank in Europe (PDBe), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK, RCSB Protein Data Bank (RCSB PDB), Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854-8087, USA and Protein Data Bank Japan (PDBj), Osaka University, Osaka, Japan ssen@ebi.ac.uk.
² Protein Data Bank in Europe (PDBe), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK, RCSB Protein Data Bank (RCSB PDB), Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854-8087, USA and Protein Data Bank Japan (PDBj), Osaka University, Osaka, Japan.

PMID: 25425036
PMCID: PMC4243272
DOI: 10.1093/database/bau116

Small molecule annotation for the Protein Data Bank

Sanchayita Sen et al. Database (Oxford). 2014.

. 2014 Nov 25:2014:bau116.

doi: 10.1093/database/bau116. Print 2014.

Authors

Affiliations

¹ Protein Data Bank in Europe (PDBe), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK, RCSB Protein Data Bank (RCSB PDB), Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854-8087, USA and Protein Data Bank Japan (PDBj), Osaka University, Osaka, Japan ssen@ebi.ac.uk.
² Protein Data Bank in Europe (PDBe), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK, RCSB Protein Data Bank (RCSB PDB), Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854-8087, USA and Protein Data Bank Japan (PDBj), Osaka University, Osaka, Japan.

PMID: 25425036
PMCID: PMC4243272
DOI: 10.1093/database/bau116

Abstract

The Protein Data Bank (PDB) is the single global repository for three-dimensional structures of biological macromolecules and their complexes, and its more than 100,000 structures contain more than 20,000 distinct ligands or small molecules bound to proteins and nucleic acids. Information about these small molecules and their interactions with proteins and nucleic acids is crucial for our understanding of biochemical processes and vital for structure-based drug design. Small molecules present in a deposited structure may be attached to a polymer or may occur as a separate, non-covalently linked ligand. During curation of a newly deposited structure by wwPDB annotation staff, each molecule is cross-referenced to the PDB Chemical Component Dictionary (CCD). If the molecule is new to the PDB, a dictionary description is created for it. The information about all small molecule components found in the PDB is distributed via the ftp archive as an external reference file. Small molecule annotation in the PDB also includes information about ligand-binding sites and about covalent and other linkages between ligands and macromolecules. During the remediation of the peptide-like antibiotics and inhibitors present in the PDB archive in 2011, it became clear that additional annotation was required for consistent representation of these molecules, which are quite often composed of several sequential subcomponents including modified amino acids and other chemical groups. The connectivity information of the modified amino acids is necessary for correct representation of these biologically interesting molecules. The combined information is made available via a new resource called the Biologically Interesting molecules Reference Dictionary, which is complementary to the CCD and is now routinely used for annotation of peptide-like antibiotics and inhibitors.

PubMed Disclaimer

Figures

**Figure 1.**
Number of new PDB chemical entity definitions created annually between 2000 and 2013.

**Figure 2.**
Abbreviated category relationship diagram for the key CIF categories that are used in the CCD. Three major categories _chem_comp, _chem_comp_bond and _chem_comp_atom are joined together to generate the machine readable dictionary description of the chemical entity. The unique three character code assigned to every new chemical entity acts as the primary key in the _chem_comp category.id (coloured in purple) and is used to connect the other categories (_chem_comp_bond.comp_id and _chem_comp_atom.comp_id).

**Figure 3.**
α-d-Glucose can form α(1–4) glycosidic linkages with other carbohydrate molecules. During the oligomerization process, the O1 oxygen (highlighted in the figure) of the glucose is eliminated by the O4 oxygen of the other carbohydrate. To account for this condensation reaction, the O1 oxygen of α-d-glucose (GLC) is annotated in the CCD as a leaving atom. The two-dimensional diagram in this figure is a copy of the image from the RCSB PDB website (http://rcsb.org/pdb/ligand/ligandsummary.do?hetId=GLC). It was generated using the ChemAxon software (http://www.chemaxon.com).

**Figure 4.**
Seven α-d-glucose (GLC) molecules undergo condensation reaction to form the circular oligosaccharide β-cyclodextrin [from PDB entry 2v8l (30)].

**Figure 5.**
Binding site for the Plk-2 inhibitor (7R)-8-cyclopentyl-7-ethyl-5-methyl-7,8-dihydropteridin-6(5H)-one (3 letter code 11 G) in PDB entry 4i6b (31). The figure depicts the neighbouring residues that are within 3.7 Å of the ligand 11 G.

**Figure 6.**
The environment for the oligosaccharide poly-N-acetylglucosamine (PNAG) is annotated instead of listing the environment of individual sugars. This avoids repeating the same sugar molecule in multiple binding sites.

**Figure 7.**
Diagram showing the relationship between the_struct_site and _struct_site_gen categories used for annotation of ligand-binding sites. The _struct_site category holds information about the ligands that are present in the PDB entry and every ligand in this category is assigned a alphanumeric binding site identifier. The _struct_site_gen category contains information of the residues that are present within the vicinity of the ligands described in the struct_site category. Both the categories are joined by the binding site identifier.

**Figure 8.**
Tetrahedrally coordinated Zn ion in entry 2VW4 (32) along with the annotation of the bond angles. The REMARK 620 annotation indicates the software calculated bond angle values between Zn A 503 and its surrounding residues. The surrounding residues in anticlockwise direction are Glu A 195, HisB 165 and Asp B 167. The sidechain carboxylate group of the Glu residue exists in two alternate conformation (A and B conformers). The angle between GluA195B-Zn-HisB165 is 117.8, GluA195B-Zn-Asp(OD1)B167 is 86.6, HisB165-Zn-Asp(OD1)B167 is 91.6, Glu195B-Zn-ASP(OD2)B167 is 105.2, HisB165-Zn-Asp(OD2)B167 is 124.9, Asp(OD1)B167-Zn-Asp(OD2)B167 is 57.1, Glu(OE1)A195B-Zn-Glu(OE2)A195A is 24.1, HisB165-Zn- Glu(OE2)A195A is 122.8, Asp(OD1)B167-Zn-Glu(OE2)A195A is 109.3 and Asp(OD2)B167-Zn-Glu(OE2)A195A is 110.7.

**Figure 9.**
Annotation of the glycopeptide antibiotic teicoplanin involves ‘chopping up’ the molecule into its component chemical entities that are validated against the CCD. The bonds highlighted in yellow demarcate the individual entities.

**Figure 10.**
σ_A weighted 2Fo-Fc map of a carbohydrate binding protein shown at a contour level of 0.35e/A^3. Very little electron density is observed for the oligosaccharide molecule. This is reflected in the high LLDF values (shown in parentheses) for each of the component carbohydrate moieties.

See this image and copyright information in PMC

References

1. Berman H.M., Henrick K., Kleywegt G., et al. . (2012) The Worldwide Protein Data Bank. Int. Tables Crystallogr., F, 827–832.
1. Rose P.W., Bi C., Bluhm W.F., et al. . (2013) The RCSB Protein Data Bank: new resources for research and education. Nucleic Acids Res., 41, D475–D482. - PMC - PubMed
1. Gutmanas A., Alhroub Y., Battle G.M., et al. . (2014) PDBe: Protein Data Bank in Europe. Nucleic Acids Res., 42, D285–D291. - PMC - PubMed
1. Kinjo A.R., Suzuki H., Yamashita R., et al. . (2012) Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nulceic Acids Res., 40, D453–D460. - PMC - PubMed
1. Ulrich E.L., Akutsu H., Doreleijers J.F., et al. . (2008) BioMagResBank. Nucleic Acids Res., 36, D402–D408. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Small molecule annotation for the Protein Data Bank

Affiliations

Small molecule annotation for the Protein Data Bank

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources