A study of statistical methods for function prediction of protein motifs
- PMID: 15693737
- DOI: 10.2165/00822942-200403020-00006
A study of statistical methods for function prediction of protein motifs
Abstract
Automatic discovery of new protein motifs (i.e. amino acid patterns) is one of the major challenges in bioinformatics. Several algorithms have been proposed that can extract statistically significant motif patterns from any set of protein sequences. With these methods, one can generate a large set of candidate motifs that may be biologically meaningful. This article examines methods to predict the functions of these candidate motifs. We use several statistical methods: a popularity method, a mutual information method and probabilistic translation models. These methods capture, from different perspectives, the correlations between the matched motifs of a protein and its assigned Gene Ontology terms that characterise the function of the protein. We evaluate these different methods using the known motifs in the InterPro database. Each method is used to rank candidate terms for each motif. We then use the expected mean reciprocal rank to evaluate the performance. The results show that, in general, all these methods perform well, suggesting that they can all be useful for predicting the function of an unknown motif. Among the methods tested, a probabilistic translation model with a popularity prior performs the best.
Similar articles
-
Five hierarchical levels of sequence-structure correlation in proteins.Appl Bioinformatics. 2004;3(2-3):97-104. doi: 10.2165/00822942-200403020-00004. Appl Bioinformatics. 2004. PMID: 15693735 Review.
-
Identification of function-associated loop motifs and application to protein function prediction.Bioinformatics. 2006 Sep 15;22(18):2237-43. doi: 10.1093/bioinformatics/btl382. Epub 2006 Jul 26. Bioinformatics. 2006. PMID: 16870939
-
Rapid motif-based prediction of circular permutations in multi-domain proteins.Bioinformatics. 2005 Apr 1;21(7):932-7. doi: 10.1093/bioinformatics/bti085. Bioinformatics. 2005. PMID: 15788783
-
Automatic annotation of protein motif function with Gene Ontology terms.BMC Bioinformatics. 2004 Sep 2;5:122. doi: 10.1186/1471-2105-5-122. BMC Bioinformatics. 2004. PMID: 15345032 Free PMC article.
-
Structure-based function prediction: approaches and applications.Brief Funct Genomic Proteomic. 2008 Jul;7(4):291-302. doi: 10.1093/bfgp/eln030. Epub 2008 Jul 3. Brief Funct Genomic Proteomic. 2008. PMID: 18599513 Review.
Cited by
-
Quantitative characterization of protein tertiary motifs.J Mol Model. 2014 Jan;20(1):2077. doi: 10.1007/s00894-014-2077-z. Epub 2014 Jan 26. J Mol Model. 2014. PMID: 24464316
-
Diversity and motif conservation in protein 3D structural landscape: exploration by a new multivariate simulation method.J Mol Model. 2018 Mar 2;24(4):76. doi: 10.1007/s00894-018-3614-y. J Mol Model. 2018. PMID: 29500695
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources