Rapid similarity searches of nucleic acid and protein data banks
- PMID: 6572363
- PMCID: PMC393452
- DOI: 10.1073/pnas.80.3.726
Rapid similarity searches of nucleic acid and protein data banks
Abstract
With the development of large data banks of protein and nucleic acid sequences, the need for efficient methods of searching such banks for sequences similar to a given sequence has become evident. We present an algorithm for the global comparison of sequences based on matching k-tuples of sequence elements for a fixed k. The method results in substantial reduction in the time required to search a data bank when compared with prior techniques of similarity analysis, with minimal loss in sensitivity. The algorithm has also been adapted, in a separate implementation, to produce rigorous sequence alignments. Currently, using the DEC KL-10 system, we can compare all sequences in the entire Protein Data Bank of the National Biomedical Research Foundation with a 350-residue query sequence in less than 3 min and carry out a similar analysis with a 500-base query sequence against all eukaryotic sequences in the Los Alamos Nucleic Acid Data Base in less than 2 min.
Similar articles
-
A novel sequence similarity searching and visualization method based on overlappingly translated nucleic acids: the blastNP.Med Hypotheses. 2004;62(4):568-74. doi: 10.1016/j.mehy.2003.11.020. Med Hypotheses. 2004. PMID: 15050109
-
A rapid access motif database (RAMdb) with a search algorithm for the retrieval patterns in nucleic acids or protein databanks.Comput Appl Biosci. 1995 Jun;11(3):273-9. doi: 10.1093/bioinformatics/11.3.273. Comput Appl Biosci. 1995. PMID: 7583695
-
Los Alamos sequence analysis package for nucleic acids and proteins.Nucleic Acids Res. 1982 Jan 11;10(1):183-96. doi: 10.1093/nar/10.1.183. Nucleic Acids Res. 1982. PMID: 6174934 Free PMC article.
-
Finding homologs to nucleic acid or protein sequences using the framesearch program.Curr Protoc Bioinformatics. 2002 Aug;Chapter 3:Unit 3.2. doi: 10.1002/0471250953.bi0302s00. Curr Protoc Bioinformatics. 2002. PMID: 18792937 Review.
-
Computational analysis of genetic sequences.Annu Rev Biophys Biophys Chem. 1986;15:79-95. doi: 10.1146/annurev.bb.15.060186.000455. Annu Rev Biophys Biophys Chem. 1986. PMID: 3521662 Review. No abstract available.
Cited by
-
Accuracy estimation and parameter advising for protein multiple sequence alignment.J Comput Biol. 2013 Apr;20(4):259-79. doi: 10.1089/cmb.2013.0007. Epub 2013 Mar 14. J Comput Biol. 2013. PMID: 23489379 Free PMC article.
-
MBIS--an integrated system for the retrieval and analyses of sequence data from nucleic acids and proteins.Nucleic Acids Res. 1986 Jan 10;14(1):265-72. doi: 10.1093/nar/14.1.265. Nucleic Acids Res. 1986. PMID: 3753768 Free PMC article.
-
A human metallothionein pseudogene containing AG/CT repetitive elements.J Mol Evol. 1990 Sep;31(3):211-20. doi: 10.1007/BF02109498. J Mol Evol. 1990. PMID: 2120457
-
Similarity of Escherichia coli propanediol oxidoreductase (fucO product) and an unusual alcohol dehydrogenase from Zymomonas mobilis and Saccharomyces cerevisiae.J Bacteriol. 1989 Jul;171(7):3754-9. doi: 10.1128/jb.171.7.3754-3759.1989. J Bacteriol. 1989. PMID: 2661535 Free PMC article.
-
Selective gene expression during sporulation of Physarum polycephalum.J Bacteriol. 1988 Oct;170(10):4784-90. doi: 10.1128/jb.170.10.4784-4790.1988. J Bacteriol. 1988. PMID: 3170484 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials