Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: validation
- PMID: 15558595
- DOI: 10.1002/jmr.721
Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: validation
Abstract
A substantial fraction of protein sequences derived from genomic analyses is currently classified as representing 'hypothetical proteins of unknown function'. In part, this reflects the limitations of methods for comparison of sequences with very low identity. We evaluated the effectiveness of a Psi-BLAST search strategy to identify proteins of similar fold at low sequence identity. Psi-BLAST searches for structurally characterized low-sequence-identity matches were carried out on a set of over 300 proteins of known structure. Searches were conducted in NCBI's non-redundant database and were limited to three rounds. Some 614 potential homologs with 25% or lower sequence identity to 166 members of the search set were obtained. Disregarding the expect value, level of sequence identity and span of alignment, correspondence of fold between the target and potential homolog was found in more than 95% of the Psi-BLAST matches. Restrictions on expect value or span of alignment improved the false positive rate at the expense of eliminating many true homologs. Approximately three-quarters of the putative homologs obtained by three rounds of Psi-BLAST revealed no significant sequence similarity to the target protein upon direct sequence comparison by BLAST, and therefore could not be found by a conventional search. Although three rounds of Psi-BLAST identified many more homologs than a standard BLAST search, most homologs were undetected. It appears that more than 80% of all homologs to a target protein may be characterized by a lack of significant sequence similarity. We suggest that conservative use of Psi-BLAST has the potential to propose experimentally testable functions for the majority of proteins currently annotated as 'hypothetical proteins of unknown function'.
Copyright 2004 John Wiley & Sons, Ltd.
Similar articles
-
Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: application.J Mol Recognit. 2005 Mar-Apr;18(2):150-7. doi: 10.1002/jmr.719. J Mol Recognit. 2005. PMID: 15593246
-
FROST: a filter-based fold recognition method.Proteins. 2002 Dec 1;49(4):493-509. doi: 10.1002/prot.10231. Proteins. 2002. PMID: 12402359
-
Identification of new claudin family members by a novel PSI-BLAST based approach with enhanced specificity.Proteins. 2006 Dec 1;65(4):808-15. doi: 10.1002/prot.21218. Proteins. 2006. PMID: 17022085
-
Needle in the haystack: structure-based toxin discovery.Trends Biochem Sci. 2008 Nov;33(11):546-56. doi: 10.1016/j.tibs.2008.08.003. Epub 2008 Sep 22. Trends Biochem Sci. 2008. PMID: 18815047 Review.
-
Practical and predictive bioinformatics methods for the identification of potentially cross-reactive protein matches.Mol Nutr Food Res. 2006 Jul;50(7):655-60. doi: 10.1002/mnfr.200500277. Mol Nutr Food Res. 2006. PMID: 16810734 Review.
Cited by
-
Identification of an ideal-like fingerprint for a protein fold using overlapped conserved residues based approach.Sci Rep. 2014 Jul 10;4:5643. doi: 10.1038/srep05643. Sci Rep. 2014. PMID: 25008052 Free PMC article.
-
Bioinformatics analysis of the locus for enterocyte effacement provides novel insights into type-III secretion.BMC Microbiol. 2005 Mar 9;5:9. doi: 10.1186/1471-2180-5-9. BMC Microbiol. 2005. PMID: 15757514 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials