PSI: indexing protein structures for fast similarity search
- PMID: 12855441
- DOI: 10.1093/bioinformatics/btg1009
PSI: indexing protein structures for fast similarity search
Abstract
Motivation: We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the protein databases increase. As the sizes of experimentally determined and theoretically estimated protein structure databases grow, there is a need for scalable searching techniques.
Results: Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. For a given query protein, this index structure is used to quickly prune away unpromising proteins in the database. The remaining proteins are then aligned using a popular alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while maintaining similar sensitivity.
Similar articles
-
Towards index-based similarity search for protein structure databases.Proc IEEE Comput Soc Bioinform Conf. 2003;2:148-58. Proc IEEE Comput Soc Bioinform Conf. 2003. PMID: 16452789
-
Index-based similarity search for protein structure databases.J Bioinform Comput Biol. 2004 Mar;2(1):99-126. doi: 10.1142/s0219720004000491. J Bioinform Comput Biol. 2004. PMID: 15272435
-
PSIST: indexing protein structures using suffix trees.Proc IEEE Comput Syst Bioinform Conf. 2005:212-22. doi: 10.1109/csb.2005.46. Proc IEEE Comput Syst Bioinform Conf. 2005. PMID: 16447979
-
Rapid retrieval of protein structures from databases.Drug Discov Today. 2007 Sep;12(17-18):732-9. doi: 10.1016/j.drudis.2007.07.014. Epub 2007 Aug 28. Drug Discov Today. 2007. PMID: 17826686 Review.
-
Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.Nat Methods. 2004 Dec;1(3):195-202. doi: 10.1038/nmeth725. Nat Methods. 2004. PMID: 15789030 Review.
Cited by
-
Secondary structure spatial conformation footprint: a novel method for fast protein structure comparison and classification.BMC Struct Biol. 2006 Jun 8;6:12. doi: 10.1186/1472-6807-6-12. BMC Struct Biol. 2006. PMID: 16762072 Free PMC article.
-
ProteinDBS: a real-time retrieval system for protein structure comparison.Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W572-5. doi: 10.1093/nar/gkh436. Nucleic Acids Res. 2004. PMID: 15215453 Free PMC article.
-
Efficient protein alignment algorithm for protein search.BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S34. doi: 10.1186/1471-2105-11-S1-S34. BMC Bioinformatics. 2010. PMID: 20122207 Free PMC article.
-
A method of protein model classification and retrieval using bag-of-visual-features.Comput Math Methods Med. 2014;2014:269394. doi: 10.1155/2014/269394. Epub 2014 Sep 1. Comput Math Methods Med. 2014. PMID: 25258644 Free PMC article.
-
Application of Transformers in Cheminformatics.J Chem Inf Model. 2024 Jun 10;64(11):4392-4409. doi: 10.1021/acs.jcim.3c02070. Epub 2024 May 30. J Chem Inf Model. 2024. PMID: 38815246 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials