Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct 14;9 Suppl 1(Suppl 1):S20.
doi: 10.1186/1477-5956-9-S1-S20.

SProt: sphere-based protein structure similarity algorithm

Affiliations

SProt: sphere-based protein structure similarity algorithm

Jakub Galgonek et al. Proteome Sci. .

Abstract

Background: Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field.

Results: We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of local similarity in the process of feature extraction. SProt's features are based on spherical spatial neighborhood of amino acids where similarity can be well-defined. On top of the partial local similarities, global measure assessing similarity to a pair of protein structures is built. Finally, indexing is applied making the search process by an order of magnitude faster.

Conclusions: The proposed method outperforms other methods in classification accuracy on SCOP superfamily and fold level, while it is at least comparable to the best existing solutions in terms of precision-recall or quality of alignment.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An example of an aa-sphere This example demonstrates an aa-sphere for the 26-th amino acid of Ubiquitin [PDB:1UBQ]. Each amino acid is represented by a ball centered in its α-carbon position. The tube corresponds to the protein backbone denoting the protein sequence. The Euclidean sphere with center in the 26-th amino acid (black) with radius 9 Å (gray). The different colors emphasize amino acids included in the aa-sphere. Some of the heavy atoms of the colored amino acids with their α-atoms outside the Euclidean sphere intersect with the sphere and thus the respective amino acids are also included in the aa-sphere. The figure has been generated by VMD [30].
Figure 2
Figure 2
Average precision-recall curves The curves were computed for all 108 queries of the ProtDex2 dataset. The data for the compared methods are borrowed from [10].
Figure 3
Figure 3
RBQ modifiers RBQ(a,b)(w) modifier is defined as the rational Bézier quadric curve starting at point [0, 0] going toward the control point [a, b] ([0.7, 0.15] in this case) and arriving to point [1, 1]. Weight w determines the degree of deflection of the curve toward the control point.
Figure 4
Figure 4
Access method efficiency The efficiency of the SProt access method was measured for different weights of the modifier and for different numbers of requested nearest neighbors. All the 108 queries of the ProtDex2 dataset were utilized and the average values are presented. The efficiency is expressed both in terms of the relative (according to sequential scan) number of protein structure pairs being compared (A) and in terms of the relative (according to sequential scan) computation time (B). Figure (B) also includes the absolute time which was measured on a machine containing an Intel Xeon E7540 2.00GHz processor. The sequential scan takes 39.4 minutes on average. Vertical dashed lines denote minimal, average and maximal size of the query families in the dataset.
Figure 5
Figure 5
Retrieval errors The retrieval errors of the SProt access method were measured for different weights of the modifier and for different numbers of requested nearest neighbors. All the 108 queries of the ProtDex2 dataset were utilized and the average values are presented. The errors are described as the percentage of protein structures included in the sequential scan result and missing in the result of the SProt access method. In the first case (A), the whole result of sequence scan is considered and the retrieval error E is measured. In the other cases, only subsets of the result sharing the same SCOP family (B), superfamily (C) or fold (D) with the query structure are taken into account, leading to different SCOP retrieval errors. An error 0% means that none of the considered structures were missing, and conversely, an error 100% means that all considered structures were missing. Vertical dashed lines denote minimal, average and maximal size of query families in the dataset.

Similar articles

Cited by

References

    1. Lathrop RH. The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 1994;7(9):1059–1068. doi: 10.1093/protein/7.9.1059. - DOI - PubMed
    1. Kabsch W. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr A. 1976;32(5):922–923. doi: 10.1107/S0567739476001873. - DOI
    1. Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;233:123–138. doi: 10.1006/jmbi.1993.1489. - DOI - PubMed
    1. Holm L, Park J. DaliLite workbench for protein structure comparison. Bioinformatics. 2000;16(6):566–567. doi: 10.1093/bioinformatics/16.6.566. - DOI - PubMed
    1. Aung Z, Tan KL. Rapid 3D protein structure database searching using information retrieval techniques. Bioinformatics. 2004;20(7):1045–1052. doi: 10.1093/bioinformatics/bth036. - DOI - PubMed

LinkOut - more resources