Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul 1:11:365.
doi: 10.1186/1471-2105-11-365.

Knowledge-based annotation of small molecule binding sites in proteins

Affiliations

Knowledge-based annotation of small molecule binding sites in proteins

Ratna R Thangudu et al. BMC Bioinformatics. .

Abstract

Background: The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity.

Results: We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones.

Conclusions: A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Statistics of small molecules and their binding sites observed in protein structure complexes. a) Number of small molecules and binding sites observed per protein chain, b) size of the observed binding sites, c) histogram showing the number of observed and inferred binding sites with plotted versus the fraction (%) of protein chains having these sites.
Figure 2
Figure 2
Biological validity of the IBIS inferred binding sites. a) Histogram showing the frequency of protein chains as function of their biological relevancy as suggested by overlap of the inferred binding sites with CDD conserved feature annotation. b) Percentage of proteins with their inferred sites having their 1st and 2nd rank clusters with CD annotations; 165 proteins have only one predicted site.
Figure 3
Figure 3
Tyrosine kinase homologoues with varying degrees of sequence conservation with different small molecules in their ATP-binding pocket. (a) Ephb2 Receptor Kinase domain with ADP. (b) Syk Tyrosine Kinase Domain in complex with Gleevec. (c) Ephb2 Receptor Tyrosine Kinase with Adenine.
Figure 4
Figure 4
Mapping of the inferred binding site. Inferred binding site of peptide deformylase P.aeruginosa (PDB:1IX1) mapped onto the sequence of Helicobacter pyroli and its agreement with the observed binding site in N-trans-caffeoyltyramin-PDF complex (PDB: 1EW5). MMDB residue numbering is used which starts from the beginning of the corresponding GenBank protein sequence.
Figure 5
Figure 5
Overview of the IBIS binding site annotation procedure.

Similar articles

Cited by

References

    1. Wang Y, Addess KJ, Chen J, Geer LY, He J, He S, Lu S, Madej T, Marchler-Bauer A, Thiessen PA, MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res. 2007. pp. D298–300. - DOI - PMC - PubMed
    1. Fukuchi S, Homma K, Sakamoto S, Sugawara H, Tateno Y, Gojobori T, Nishikawa K. The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions. Nucleic Acids Res. 2009. pp. D333–337. - DOI - PMC - PubMed
    1. Bork P, Koonin EV. Predicting functions from protein sequences--where are the bottlenecks? Nat Genet. 1998;18(4):313–318. doi: 10.1038/ng0498-313. - DOI - PubMed
    1. Gerlt JA, Babbitt PC. Can sequence determine function? Genome Biol. 2000;1(5):REVIEWS0005. doi: 10.1186/gb-2000-1-5-reviews0005. - DOI - PMC - PubMed
    1. Hegyi H, Gerstein M. The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J Mol Biol. 1999;288(1):147–164. doi: 10.1006/jmbi.1999.2661. - DOI - PubMed

Publication types

LinkOut - more resources