Rapid and sensitive sequence comparison with FASTP and FASTA
- PMID: 2156132
- DOI: 10.1016/0076-6879(90)83007-v
Rapid and sensitive sequence comparison with FASTP and FASTA
Abstract
The FASTA program can search the NBRF protein sequence library (2.5 million residues) in less than 20 min on an IBM-PC microcomputer and unambiguously detect proteins that shared a common ancestor billions of years in the past. FASTA is both fast and selective because it initially considers only amino acid identities. Its sensitivity is increased not only by using the PAM250 matrix to score and rescore regions with large numbers of identities but also by joining initial regions. The results of searches with FASTA compare favorably with results using NWS-based programs that are 100 times slower. FASTA is slightly less sensitive but considerably more selective. It is not clear that NWS-based programs would be more successful in finding distantly related members of the G-protein-coupled receptor family. The joining step by FASTA to calculate the initn score is especially useful for sequences that share regions of sequence similarity that are separated by variable-length loops. FASTP and FASTA were designed to identify protein sequences that have descended from a common ancestor, and they have proved very useful for this task. In many cases, a FASTA sequence search will result in a list of high scoring library sequences that are homologous to the query sequence, or the search will result in a list of sequences with similarity scores that cannot be distinguished from the bulk of the library. In either case, the question of whether there are sequences in the library that are clearly related to the query sequence has been answered unambiguously. Unfortunately, the results often will not be so clear-cut, and careful analysis of similarity scores, statistical significance, the actual aligned residues, and the biological context are required. In the course of analyzing the G-protein-coupled receptor family, several proteins were found that, because of a high initn score and a low init1 score that increased almost 2-fold with optimization, appeared to be members of this family which were not previously recognized. RDF2 analysis showed borderline z values, and only a careful examination of the sequence alignments that focused on the conserved residues provided convincing evidence that the high scores were fortuitous. As sequence comparison methods become more powerful by becoming more sensitive, they become more likely to mislead, and even greater care is required.
Similar articles
-
Effective protein sequence comparison.Methods Enzymol. 1996;266:227-58. doi: 10.1016/s0076-6879(96)66017-0. Methods Enzymol. 1996. PMID: 8743688
-
Improved tools for biological sequence comparison.Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8. doi: 10.1073/pnas.85.8.2444. Proc Natl Acad Sci U S A. 1988. PMID: 3162770 Free PMC article.
-
Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.Genomics. 1991 Nov;11(3):635-50. doi: 10.1016/0888-7543(91)90071-l. Genomics. 1991. PMID: 1774068
-
An approach to searching protein sequences for superfamily relationships or chance similarities relevant to the molecular mimicry hypothesis: application to the basic proteins of myelin.J Neurochem. 1988 Oct;51(4):1267-73. doi: 10.1111/j.1471-4159.1988.tb03096.x. J Neurochem. 1988. PMID: 2458435 Review.
-
Finding homologs to nucleic acid or protein sequences using the framesearch program.Curr Protoc Bioinformatics. 2002 Aug;Chapter 3:Unit 3.2. doi: 10.1002/0471250953.bi0302s00. Curr Protoc Bioinformatics. 2002. PMID: 18792937 Review.
Cited by
-
Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.BMC Bioinformatics. 2013 Jan 16;14:10. doi: 10.1186/1471-2105-14-10. BMC Bioinformatics. 2013. PMID: 23323800 Free PMC article.
-
A synthetic lethal screen identifies SLK1, a novel protein kinase homolog implicated in yeast cell morphogenesis and cell growth.Mol Cell Biol. 1992 Mar;12(3):1162-78. doi: 10.1128/mcb.12.3.1162-1178.1992. Mol Cell Biol. 1992. PMID: 1545797 Free PMC article.
-
Nucleotide sequence and transcriptional regulation of the yeast recombinational repair gene RAD51.Mol Cell Biol. 1992 Jul;12(7):3235-46. doi: 10.1128/mcb.12.7.3235-3246.1992. Mol Cell Biol. 1992. PMID: 1620128 Free PMC article.
-
PSimScan: algorithm and utility for fast protein similarity search.PLoS One. 2013;8(3):e58505. doi: 10.1371/journal.pone.0058505. Epub 2013 Mar 7. PLoS One. 2013. PMID: 23505522 Free PMC article.
-
Characterization of a yeast nuclear gene, AEP2, required for accumulation of mitochondrial mRNA encoding subunit 9 of the ATP synthase.Curr Genet. 1991 Jul;20(1-2):53-61. doi: 10.1007/BF00312765. Curr Genet. 1991. PMID: 1718609
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous