Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms
- PMID: 1774068
- DOI: 10.1016/0888-7543(91)90071-l
Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms
Abstract
The sensitivity and selectivity of the FASTA and the Smith-Waterman protein sequence comparison algorithms were evaluated using the superfamily classification provided in the National Biomedical Research Foundation/Protein Identification Resource (PIR) protein sequence database. Sequences from each of the 34 superfamilies in the PIR database with 20 or more members were compared against the protein sequence database. The similarity scores of the related and unrelated sequences were determined using either the FASTA program or the Smith-Waterman local similarity algorithm. These two sets of similarity scores were used to evaluate the ability of the two comparison algorithms to identify distantly related protein sequences. The FASTA program using the ktup = 2 sensitivity setting performed as well as the Smith-Waterman algorithm for 19 of the 34 superfamilies. Increasing the sensitivity by setting ktup = 1 allowed FASTA to perform as well as Smith-Waterman on an additional 7 superfamilies. The rigorous Smith-Waterman method performed better than FASTA with ktup = 1 on 8 superfamilies, including the globins, immunoglobulin variable regions, calmodulins, and plastocyanins. Several strategies for improving the sensitivity of FASTA were examined. The greatest improvement in sensitivity was achieved by optimizing a band around the best initial region found for every library sequence. For every superfamily except the globins and immunoglobulin variable regions, this strategy was as sensitive as a full Smith-Waterman. For some sequences, additional sensitivity was achieved by including conserved but nonidentical residues in the lookup table used to identify the initial region.
Similar articles
-
Comparison of methods for searching protein sequence databases.Protein Sci. 1995 Jun;4(6):1145-60. doi: 10.1002/pro.5560040613. Protein Sci. 1995. PMID: 7549879 Free PMC article.
-
SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments.Bioinformatics. 1998;14(10):839-45. doi: 10.1093/bioinformatics/14.10.839. Bioinformatics. 1998. PMID: 9927712
-
Empirical statistical estimates for sequence similarity searches.J Mol Biol. 1998 Feb 13;276(1):71-84. doi: 10.1006/jmbi.1997.1525. J Mol Biol. 1998. PMID: 9514730
-
A sequence similarity search algorithm based on a probabilistic interpretation of an alignment scoring system.Proc Int Conf Intell Syst Mol Biol. 1996;4:44-51. Proc Int Conf Intell Syst Mol Biol. 1996. PMID: 8877503 Review.
-
A Review of Parallel Implementations for the Smith-Waterman Algorithm.Interdiscip Sci. 2022 Mar;14(1):1-14. doi: 10.1007/s12539-021-00473-0. Epub 2021 Sep 6. Interdiscip Sci. 2022. PMID: 34487327 Free PMC article. Review.
Cited by
-
Diversity in parasitic nematode genomes: the microRNAs of Brugia pahangi and Haemonchus contortus are largely novel.BMC Genomics. 2012 Jan 4;13:4. doi: 10.1186/1471-2164-13-4. BMC Genomics. 2012. PMID: 22216965 Free PMC article.
-
Testing statistical significance scores of sequence comparison methods with structure similarity.BMC Bioinformatics. 2006 Oct 12;7:444. doi: 10.1186/1471-2105-7-444. BMC Bioinformatics. 2006. PMID: 17038163 Free PMC article.
-
SIMAP: the similarity matrix of proteins.Nucleic Acids Res. 2006 Jan 1;34(Database issue):D252-6. doi: 10.1093/nar/gkj106. Nucleic Acids Res. 2006. PMID: 16381858 Free PMC article.
-
Surveillance for Adenoviruses in Bats in Italy.Viruses. 2019 Jun 6;11(6):523. doi: 10.3390/v11060523. Viruses. 2019. PMID: 31174292 Free PMC article.
-
Ovine ruminal microbes are capable of biotransforming hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX).Microb Ecol. 2011 Aug;62(2):274-86. doi: 10.1007/s00248-011-9809-8. Epub 2011 Feb 22. Microb Ecol. 2011. PMID: 21340737
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources