Issues in predicting protein function from sequence
- PMID: 11465059
- DOI: 10.1093/bib/2.1.19
Issues in predicting protein function from sequence
Abstract
Identifying homologues, defined as genes that arose from a common evolutionary ancestor, is often a relatively straightforward task, thanks to recent advances made in estimating the statistical significance of sequence similarities found from database searches. The extent by which homologues possess similarities in function, however, is less amenable to statistical analysis. Consequently, predicting function by homology is a qualitative, rather than quantitative, process and requires particular care to be taken. This review focuses on the various approaches that have been developed to predict function from the scale of the atom to that of the organism. Similarities in homologues' functions differ considerably at each of these different scales and also vary for different domain families. It is argued that due attention should be paid to all available clues to function, including orthologue identification, conservation of particular residue types, and the co-occurrence of domains in proteins. Pitfalls in database searching methods arising from amino acid compositional bias and database size effects are also discussed.
Similar articles
-
Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation.J Mol Biol. 1997 Jun 13;269(3):423-39. doi: 10.1006/jmbi.1997.1019. J Mol Biol. 1997. PMID: 9199410
-
Identification of distant homologues of fibroblast growth factors suggests a common ancestor for all beta-trefoil proteins.J Mol Biol. 2000 Oct 6;302(5):1041-7. doi: 10.1006/jmbi.2000.4087. J Mol Biol. 2000. PMID: 11183773
-
Estimating residue evolutionary conservation by introducing von Neumann entropy and a novel gap-treating approach.Amino Acids. 2008 Aug;35(2):495-501. doi: 10.1007/s00726-007-0586-0. Epub 2007 Aug 21. Amino Acids. 2008. PMID: 17710364 Free PMC article.
-
Scoring residue conservation.Proteins. 2002 Aug 1;48(2):227-41. doi: 10.1002/prot.10146. Proteins. 2002. PMID: 12112692 Review.
-
Predicting functions from protein sequences--where are the bottlenecks?Nat Genet. 1998 Apr;18(4):313-8. doi: 10.1038/ng0498-313. Nat Genet. 1998. PMID: 9537411 Review.
Cited by
-
Candidate Essential Genes in Burkholderia cenocepacia J2315 Identified by Genome-Wide TraDIS.Front Microbiol. 2016 Aug 22;7:1288. doi: 10.3389/fmicb.2016.01288. eCollection 2016. Front Microbiol. 2016. PMID: 27597847 Free PMC article.
-
Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution.Bioinformatics. 2011 Feb 1;27(3):326-33. doi: 10.1093/bioinformatics/btq655. Epub 2010 Nov 29. Bioinformatics. 2011. PMID: 21118823 Free PMC article.
-
The evolutionary origin of epithelial cell-cell adhesion mechanisms.Curr Top Membr. 2013;72:267-311. doi: 10.1016/B978-0-12-417027-8.00008-8. Curr Top Membr. 2013. PMID: 24210433 Free PMC article.
-
SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence.Nucleic Acids Res. 2003 Jul 1;31(13):3692-7. doi: 10.1093/nar/gkg600. Nucleic Acids Res. 2003. PMID: 12824396 Free PMC article.
-
From gene networks to gene function.Genome Res. 2003 Dec;13(12):2568-76. doi: 10.1101/gr.1111403. Genome Res. 2003. PMID: 14656964 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources