eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
- PMID: 15608172
- PMCID: PMC540014
- DOI: 10.1093/nar/gki060
eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity
Abstract
Classifying proteins into families and superfamilies allows identification of functionally important conserved domains. The motifs and scoring matrices derived from such conserved regions provide computational tools that recognize similar patterns in novel sequences, and thus enable the prediction of protein function for genomes. The eBLOCKs database enumerates a cascade of protein blocks with varied conservation levels for each functional domain. A biologically important region is most stringently conserved among a smaller family of highly similar proteins. The same region is often found in a larger group of more remotely related proteins with a reduced stringency. Through enumeration, highly specific signatures can be generated from blocks with more columns and fewer family members, while highly sensitive signatures can be derived from blocks with fewer columns and more members as in a superfamily. By applying PSI-BLAST and a modified K-means clustering algorithm, eBLOCKs automatically groups protein sequences according to different levels of similarity. Multiple sequence alignments are made and trimmed into a series of ungapped blocks. Motifs and position-specific scoring matrices were derived from eBLOCKs and made available for sequence search and annotation. The eBLOCKs database provides a tool for high-throughput genome annotation with maximal specificity and sensitivity. The eBLOCKs database is freely available on the World Wide Web at http://motif.stanford.edu/eblocks/ to all users for online usage. Academic and not-for-profit institutions wishing copies of the program may contact Douglas L. Brutlag (brutlag@stanford.edu). Commercial firms wishing copies of the program for internal installation may contact Jacqueline Tay at the Stanford Office of Technology Licensing (jacqueline.tay@stanford.edu; http://otl.stanford.edu/).
Figures




Similar articles
-
PASS2: an automated database of protein alignments organised as structural superfamilies.BMC Bioinformatics. 2004 Apr 2;5:35. doi: 10.1186/1471-2105-5-35. BMC Bioinformatics. 2004. PMID: 15059245 Free PMC article.
-
The EMOTIF database.Nucleic Acids Res. 2001 Jan 1;29(1):202-4. doi: 10.1093/nar/29.1.202. Nucleic Acids Res. 2001. PMID: 11125091 Free PMC article.
-
HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.PLoS One. 2011 Mar 10;6(3):e17568. doi: 10.1371/journal.pone.0017568. PLoS One. 2011. PMID: 21423752 Free PMC article.
-
Building a biological space based on protein sequence similarities and biological ontologies.Comb Chem High Throughput Screen. 2008 Sep;11(8):653-60. doi: 10.2174/138620708785739925. Comb Chem High Throughput Screen. 2008. PMID: 18795884 Review.
-
Protein sequence motifs.Curr Opin Struct Biol. 1996 Jun;6(3):366-76. doi: 10.1016/s0959-440x(96)80057-1. Curr Opin Struct Biol. 1996. PMID: 8804823 Review.
Cited by
-
Choosing negative examples for the prediction of protein-protein interactions.BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2105-7-S1-S2. BMC Bioinformatics. 2006. PMID: 16723005 Free PMC article.
-
Analysis of the peroxiredoxin family: using active-site structure and sequence information for global classification and residue analysis.Proteins. 2011 Mar;79(3):947-64. doi: 10.1002/prot.22936. Epub 2010 Dec 22. Proteins. 2011. PMID: 21287625 Free PMC article.
-
Protein structural modularity and robustness are associated with evolvability.Genome Biol Evol. 2011;3:456-75. doi: 10.1093/gbe/evr046. Epub 2011 May 21. Genome Biol Evol. 2011. PMID: 21602570 Free PMC article.
-
MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences.Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W356-61. doi: 10.1093/nar/gkl309. Nucleic Acids Res. 2006. Corrected and republished in: Nucleic Acids Res. 2008 Mar;36(4):1400-6. doi: 10.1093/nar/gkm717. PMID: 16845025 Free PMC article. Corrected and republished.
-
InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale.Genome Biol. 2007;8(9):R192. doi: 10.1186/gb-2007-8-9-r192. Genome Biol. 2007. PMID: 17868464 Free PMC article.
References
-
- Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. - PubMed
-
- Altschul S.F. and Gish,W. (1996) local alignment statistics. Methods Enzymol., 266, 460–480. - PubMed
-
- Smith T.F. and Waterman,M.S. (1981) Identification of common molecular subsequences. J. Mol. Biol., 147, 195–197. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials