Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
- PMID: 9600919
- PMCID: PMC27587
- DOI: 10.1073/pnas.95.11.6073
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
Abstract
Pairwise sequence comparison methods have been assessed using proteins whose relationships are known reliably from their structures and functions, as described in the SCOP database [Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia C. (1995) J. Mol. Biol. 247, 536-540]. The evaluation tested the programs BLAST [Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403-410], WU-BLAST2 [Altschul, S. F. & Gish, W. (1996) Methods Enzymol. 266, 460-480], FASTA [Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444-2448], and SSEARCH [Smith, T. F. & Waterman, M. S. (1981) J. Mol. Biol. 147, 195-197] and their scoring schemes. The error rate of all algorithms is greatly reduced by using statistical scores to evaluate matches rather than percentage identity or raw scores. The E-value statistical scores of SSEARCH and FASTA are reliable: the number of false positives found in our tests agrees well with the scores reported. However, the P-values reported by BLAST and WU-BLAST2 exaggerate significance by orders of magnitude. SSEARCH, FASTA ktup = 1, and WU-BLAST2 perform best, and they are capable of detecting almost all relationships between proteins whose sequence identities are >30%. For more distantly related proteins, they do much less well; only one-half of the relationships between proteins with 20-30% identity are found. Because many homologs have low sequence similarity, most distant relationships cannot be detected by any pairwise comparison method; however, those which are identified may be used with confidence.
Figures






Similar articles
-
Effective protein sequence comparison.Methods Enzymol. 1996;266:227-58. doi: 10.1016/s0076-6879(96)66017-0. Methods Enzymol. 1996. PMID: 8743688
-
Comparative accuracy of methods for protein sequence similarity search.Bioinformatics. 1998;14(1):40-7. doi: 10.1093/bioinformatics/14.1.40. Bioinformatics. 1998. PMID: 9520500
-
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods.J Mol Biol. 1998 Dec 11;284(4):1201-10. doi: 10.1006/jmbi.1998.2221. J Mol Biol. 1998. PMID: 9837738
-
Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.Genomics. 1991 Nov;11(3):635-50. doi: 10.1016/0888-7543(91)90071-l. Genomics. 1991. PMID: 1774068
-
Practical and predictive bioinformatics methods for the identification of potentially cross-reactive protein matches.Mol Nutr Food Res. 2006 Jul;50(7):655-60. doi: 10.1002/mnfr.200500277. Mol Nutr Food Res. 2006. PMID: 16810734 Review.
Cited by
-
Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint.BMC Bioinformatics. 2007 Mar 9;8:86. doi: 10.1186/1471-2105-8-86. BMC Bioinformatics. 2007. PMID: 17349043 Free PMC article.
-
The CATH extended protein-family database: providing structural annotations for genome sequences.Protein Sci. 2002 Feb;11(2):233-44. doi: 10.1110/ps.16802. Protein Sci. 2002. PMID: 11790833 Free PMC article.
-
Optimizing amino acid substitution matrices with a local alignment kernel.BMC Bioinformatics. 2006 May 5;7:246. doi: 10.1186/1471-2105-7-246. BMC Bioinformatics. 2006. PMID: 16677385 Free PMC article.
-
The growth-regulatory protein HCRP1/hVps37A is a subunit of mammalian ESCRT-I and mediates receptor down-regulation.Mol Biol Cell. 2004 Sep;15(9):4337-46. doi: 10.1091/mbc.e04-03-0250. Epub 2004 Jul 7. Mol Biol Cell. 2004. PMID: 15240819 Free PMC article.
-
A comparative genome-wide study of ncRNAs in trypanosomatids.BMC Genomics. 2010 Nov 4;11:615. doi: 10.1186/1471-2164-11-615. BMC Genomics. 2010. PMID: 21050447 Free PMC article.
References
-
- Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. J Mol Biol. 1990;215:403–410. - PubMed
-
- Altschul S F, Gish W. Methods Enzymol. 1996;266:460–480. - PubMed
-
- Murzin A G, Brenner S E, Hubbard T, Chothia C. J Mol Biol. 1995;247:536–540. - PubMed
-
- Brenner S E, Chothia C, Hubbard T J P, Murzin A G. Methods Enzymol. 1996;266:635–643. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials