A novel sequence similarity searching and visualization method based on overlappingly translated nucleic acids: the blastNP
- PMID: 15050109
- DOI: 10.1016/j.mehy.2003.11.020
A novel sequence similarity searching and visualization method based on overlappingly translated nucleic acids: the blastNP
Abstract
Sequence data are stored in nucleic acid and protein databases. Searching the nucleic acid databases is very specific but rather insensitive method. Searching protein databases is sensitive but not very specific procedure. It was expected that the combination of these methods might provide an optimal approach. Therefore an alternative method to TblastX has been developed, known as blastNP. Nucleic acids in database and query sequences were translated into overlapping protein-like sequences (overlappingly translated sequences or OTSs) before searching with blastP. Thus, each nucleic acid sequence is represented by a single "protein like" sequence (instead of three hypothetical proteins in different reading frames). The blastNP method is defined as a blastP that is performed on an overlappingly translated nucleic acid database using a similarly converted nucleic acid query. The specificity and sensitivity of blastNP and TblastX is very similar, however blastNP is more sensitive to detect short sequence similarities (less than 50 residues). BlastNP combines the advantages of nucleotide and protein blasts and bypasses many difficulties: (1). it is more sensitive to weak sequence similarities than blastN, (2). codon redundancy is eliminated, (3). the sensitivity to single nucleotide polymorphism, mutation and sequencing errors are reduced, (4). it is insensitive to frame shifts. This novel method was proved to find significant sequence similarities which remained hidden for other methods and is a promising tool for further understanding (and annotating) the function of many old and new sequences.
Similar articles
-
The BlastNP: a novel, sensitive sequence similarity searching method using overlappingly translated sequences.Conf Proc IEEE Eng Med Biol Soc. 2004;2004:2777-80. doi: 10.1109/IEMBS.2004.1403794. Conf Proc IEEE Eng Med Biol Soc. 2004. PMID: 17270853
-
Overlapping translation of nucleic acid sequences for bioinformatics applications.Med Hypotheses. 2003 May;60(5):654-9. doi: 10.1016/s0306-9877(03)00008-2. Med Hypotheses. 2003. PMID: 12710898
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Searching for hypothetical proteins: theory and practice based upon original data and literature.Prog Neurobiol. 2005 Sep-Oct;77(1-2):90-127. doi: 10.1016/j.pneurobio.2005.10.001. Epub 2005 Nov 4. Prog Neurobiol. 2005. PMID: 16271823 Review.
-
Potential implications of availability of short amino acid sequences in proteins: an old and new approach to protein decoding and design.Biotechnol Annu Rev. 2008;14:109-41. doi: 10.1016/S1387-2656(08)00004-5. Biotechnol Annu Rev. 2008. PMID: 18606361 Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources