Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Dec 28:3:75-84.
doi: 10.2142/biophysics.3.75. eCollection 2007.

Similarity search for local protein structures at atomic resolution by exploiting a database management system

Affiliations

Similarity search for local protein structures at atomic resolution by exploiting a database management system

Akira R Kinjo et al. Biophysics (Nagoya-shi). .

Abstract

A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational database management system with appropriate indexing of geometric data. This method, which we call geometric indexing, can enumerate ligand binding sites that are structurally similar to sub-structures of a query protein among more than 160,000 possible candidates within a few hours of CPU time on an ordinary desktop computer. After detecting a set of high scoring ligand binding sites by the geometric indexing search, structural alignments at atomic resolution are constructed by iteratively applying the Hungarian algorithm, and the statistical significance of the final score is estimated from an empirical model based on a gamma distribution. Applications of this method to several protein structures clearly shows that significant similarities can be detected between local structures of non-homologous as well as homologous proteins.

Keywords: Hungarian algorithm; geometric indexing; ligand binding sites; relational database; structural alignment.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the method. The left part (“Compiling database”) illustrates the pre-processing step. The right part (“Searching”) shows the search step for a given protein structure as a query.
Figure 2
Figure 2
Local coordinate system defined by a refset (tetrahedron).
Figure 3
Figure 3
Comparison of GI score and IR score. Each point represents a template included in the top 50,000 hits for the query (PDB ID: 101m). The regression line is also shown. The correlation coefficient between the scores is 0.87.
Figure 4
Figure 4
Distribution of IR scores of randomly selected templates. The red bars indicate the histogram of IR scores of randomly selected templates obtained for the query 101m. The green line is the probability density function (PDF) of the gamma distribution GAM(α, β) with the parameters α=1.32 and β=1.75 calculated from the mean and variance of the scores. The blue line is the PDF of the type 2 extreme value distribution with the parameters determined to best fit the histogram.
Figure 5
Figure 5
Scatter plot of the IR scores and coordinate RMS deviations resulted from a search with the PDB entry 101m. The regions enclosed by the circles marked with M and G contain mostly myoglobins and other globins, respectively.
Figure 6
Figure 6
Optimal superpositions of the query 1svn on templates. The wire-frame model in the CPK color scheme is the query protein 1svn. The template atoms are colored in green. Aligned atoms are in ball-and-stick model. The ligand of the template is the ball-and-stick model in magenta. A: Peptide-binding site of subtilisin DY (PDB ID: 1bh621). B: Peptide-binding site of γ-chymotrypsin (PDB ID: 7gch24); the labeled Ser, His, Asp are the aligned catalytic triad. The figures were created by using the PDBjViewer.
Figure 7
Figure 7
Optimal superpositions of the ATP-binding sites of the query cAMP-dependent protein kinase (cAPK; PDB ID: 1atp26) on templates. A: The template is the ATP-binding site of casein kinase-1 (PDB ID: 1csn29) from Schizosaccharomyces pombe. B: The template is the ATP-binding site of glutathion synthetase (PDB ID: 1m0w30) from Saccharomyces cerevisiae. The color scheme is the same as Fig. 6. The ligand of 1atp is also shown in the stick model with the CPK colors.
Figure 8
Figure 8
Optimal superpositions of the NAD-binding sites of the query alcohol dehydrogenase (PDB ID: 1het) on templates. A: The template is the NAD-binding site of urocanase protein (PDB ID: 1x87; Tereshko et al., unpublished) from Bacillus stearothermophilus. B: The template is the FAD-binding site of p-hydroxybenzoate hydroxylase (PDB ID: 1iuv32) from Pseudomonas aeruginosa. The color scheme is the same as Fig. 6. The ligand of 1het is also shown in the stick model with the CPK colors.

Similar articles

Cited by

References

    1. Jones S, Thornton JM. Searching for functional sites in protein structures. Curr Opin Struct Biol. 2004;8:3–7. - PubMed
    1. Kinoshita K, Sadanami K, Kidera A, Go N. Structural motif of phosphate-binding site common to various protein superfamilies: all-against-all structural comparison of protein-mononucleotide complexes. Protein Eng. 1999;12:11–14. - PubMed
    1. Kinoshita K, Nakamura H. Identification of protein biochemical functions by similarity search using the molecular surface database eF-site. Protein Sci. 2003;12:1589–1595. - PMC - PubMed
    1. Brakoulias A, Jackson RM. Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching. Proteins. 2004;56:250–260. - PubMed
    1. Wallace AC, Borkakoti N, Thornton JM. TESS: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. application to enzyme active sites. Protein Sci. 1997;6:2308–2323. - PMC - PubMed