Measuring the similarity of protein structures by means of the universal similarity metric
- PMID: 14751983
- DOI: 10.1093/bioinformatics/bth031
Measuring the similarity of protein structures by means of the universal similarity metric
Abstract
Motivation: As an increasing number of protein structures become available, the need for algorithms that can quantify the similarity between protein structures increases as well. Thus, the comparison of proteins' structures, and their clustering accordingly to a given similarity measure, is at the core of today's biomedical research. In this paper, we show how an algorithmic information theory inspired Universal Similarity Metric (USM) can be used to calculate similarities between protein pairs. The method, besides being theoretically supported, is surprisingly simple to implement and computationally efficient.
Results: Structural similarity between proteins in four different datasets was measured using the USM. The sample employed represented alpha, beta, alpha-beta, tim-barrel, globins and serpine protein types. The use of the proposed metric allows for a correct measurement of similarity and classification of the proteins in the four datasets.
Availability: All the scripts and programs used for the preparation of this paper are available at http://www.cs.nott.ac.uk/~nxk/USM/protocol.html. In that web-page the reader will find a brief description on how to use the various scripts and programs.
Similar articles
-
Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.BMC Bioinformatics. 2007 Jul 13;8:252. doi: 10.1186/1471-2105-8-252. BMC Bioinformatics. 2007. PMID: 17629909 Free PMC article.
-
Index-based similarity search for protein structure databases.J Bioinform Comput Biol. 2004 Mar;2(1):99-126. doi: 10.1142/s0219720004000491. J Bioinform Comput Biol. 2004. PMID: 15272435
-
PSI: indexing protein structures for fast similarity search.Bioinformatics. 2003;19 Suppl 1:i81-3. doi: 10.1093/bioinformatics/btg1009. Bioinformatics. 2003. PMID: 12855441
-
Graph-based clustering for finding distant relationships in a large set of protein sequences.Bioinformatics. 2004 Jan 22;20(2):243-52. doi: 10.1093/bioinformatics/btg397. Bioinformatics. 2004. PMID: 14734316
-
The limits of protein sequence comparison?Curr Opin Struct Biol. 2005 Jun;15(3):254-60. doi: 10.1016/j.sbi.2005.05.005. Curr Opin Struct Biol. 2005. PMID: 15919194 Free PMC article. Review.
Cited by
-
Fast Phylogeny of SARS-CoV-2 by Compression.Entropy (Basel). 2022 Mar 22;24(4):439. doi: 10.3390/e24040439. Entropy (Basel). 2022. PMID: 35455102 Free PMC article.
-
Graph Theory-Based Sequence Descriptors as Remote Homology Predictors.Biomolecules. 2019 Dec 23;10(1):26. doi: 10.3390/biom10010026. Biomolecules. 2019. PMID: 31878100 Free PMC article.
-
ProCKSI: a decision support system for Protein (structure) Comparison, Knowledge, Similarity and Information.BMC Bioinformatics. 2007 Oct 26;8:416. doi: 10.1186/1471-2105-8-416. BMC Bioinformatics. 2007. PMID: 17963510 Free PMC article.
-
Aligning sequences by minimum description length.EURASIP J Bioinform Syst Biol. 2007;2007(1):72936. doi: 10.1155/2007/72936. EURASIP J Bioinform Syst Biol. 2007. PMID: 18274649 Free PMC article.
-
Comparing biological networks via graph compression.BMC Syst Biol. 2010 Sep 13;4 Suppl 2(Suppl 2):S13. doi: 10.1186/1752-0509-4-S2-S13. BMC Syst Biol. 2010. PMID: 20840727 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials