Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Jan 18;11 Suppl 1(Suppl 1):S46.
doi: 10.1186/1471-2105-11-S1-S46.

A fast indexing approach for protein structure comparison

Affiliations
Comparative Study

A fast indexing approach for protein structure comparison

Lei Zhang et al. BMC Bioinformatics. .

Abstract

Background: Protein structure comparison is a fundamental task in structural biology. While the number of known protein structures has grown rapidly over the last decade, searching a large database of protein structures is still relatively slow using existing methods. There is a need for new techniques which can rapidly compare protein structures, whilst maintaining high matching accuracy.

Results: We have developed IR Tableau, a fast protein comparison algorithm, which leverages the tableau representation to compare protein tertiary structures. IR tableau compares tableaux using information retrieval style feature indexing techniques. Experimental analysis on the ASTRAL SCOP protein structural domain database demonstrates that IR Tableau achieves two orders of magnitude speedup over the search times of existing methods, while producing search results of comparable accuracy.

Conclusion: We show that it is possible to obtain very significant speedups for the protein structure comparison problem, by employing an information retrieval style approach for indexing proteins. The comparison accuracy achieved is also strong, thus opening the way for large scale processing of very large protein structure databases.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Tableau orientation encoding scheme.
Figure 2
Figure 2
ROC curves for 200 query set. ROC curve of IR Tableau, TableauSearch and Yakusa for 200 query set in ASTRAL 1.73 95% data set.
Figure 3
Figure 3
Precision-recall curves for 200 query set. Precision-recall curve of i) IR Tableau, ii) TableauSearch and iii) Yakusa for 200 query set in ASTRAL 1.73 95% data set. The MAP scores are respectively i) 0.498, ii) 0.647, iii) 0.74.
Figure 4
Figure 4
ROC curve for d1ae6h1. ROC curve of IR Tableau, TableauSearch and Yakusa for protein d1ae6h1 in ASTRAL 1.73 95% data set.
Figure 5
Figure 5
Precision-recall curve of d1ae6h1. Precision-recall curve of IR Tableau, TableauSearch and Yakusa for protein d1ae6h1 in ASTRAL 1.73 95% data set.
Figure 6
Figure 6
Superposition graph. Superposition of the top 20 results using d1ubia as a query protein. MUSTANG [5] was used for structure alignment. This figure is generated using PyMol [30].
Figure 7
Figure 7
Distribution of number of SSEs in ASTRAL 1.73 data set. X axis: number of SSEs in a proteins, Y axis: the number of proteins.

Similar articles

Cited by

References

    1. Lesk A. Bioinformatics. Oxford University Press; 2002.
    1. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S. et al.The Protein Data Bank. Acta Crystallographica. Section D, Biological Crystallography. 2002;11(Pt 6 No 1):899–907. doi: 10.1107/S0907444902003451. - DOI - PubMed
    1. Holm L, Sander C. Mapping the protein universe. Science (New York, NY) 1996;11(5275):595–603. [PMID: 8662544]. - PubMed
    1. Orengo CA, Taylor WR. SSAP: sequential structure alignment program for protein structure comparison. Methods in Enzymology. 1996;11:617–635. full_text. - PubMed
    1. Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM. MUSTANG: A multiple structural alignment algorithm. Proteins: Structure, Function, and Bioinformatics. 2006;11(3):559–574. doi: 10.1002/prot.20921. - DOI - PubMed

Publication types

LinkOut - more resources