Sequence alignment: an approximation law for the Z-value with applications to databank scanning
- PMID: 11459354
- DOI: 10.1016/s0097-8485(01)00074-2
Sequence alignment: an approximation law for the Z-value with applications to databank scanning
Abstract
The Z-value is an attempt to estimate the statistical significance of a Smith and Waterman dynamic programming alignment score (H-score) through the use of a Monte-Carlo procedure. In this paper, we give an approximation for the Z-value law deduced from the Poisson clumping heuristic developed by Waterman and Vingron (Stat. Sci. 9 (1994) 367) in the case of independent and identically distributed sequences comparison. As for non-gapped alignment scores, our approximation is of Gumbel type but with parameters that are sequence independent. This result makes clear the related experimental results mentioned by Comet et al. (Comput. Chem. 23 (1999) 317). Using 'quasi-real' sequences (i.e. randomly shuffled sequences of the same length and amino acid composition as the real ones) we investigate the relevance of our approximation result. Since the Monte-Carlo approach we use generates a bias for the Gumbel decay parameter estimation, a correction procedure is proposed. Applications to real sequences are considered and we show how our results can be used to detect the potential biological relationships between real sequences.
Similar articles
-
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.BMC Bioinformatics. 2008 Aug 7;9:332. doi: 10.1186/1471-2105-9-332. BMC Bioinformatics. 2008. PMID: 18687111 Free PMC article.
-
Significance of Z-value statistics of Smith-Waterman scores for protein alignments.Comput Chem. 1999 Jun 15;23(3-4):317-31. doi: 10.1016/s0097-8485(99)00008-x. Comput Chem. 1999. PMID: 10627144
-
An incremental algorithm for Z-value computations.Comput Chem. 2002 Jul;26(5):403-11. doi: 10.1016/s0097-8485(02)00003-7. Comput Chem. 2002. PMID: 12144171
-
Robust E-values for gapped local alignments.J Comput Biol. 2006 May;13(4):882-96. doi: 10.1089/cmb.2006.13.882. J Comput Biol. 2006. PMID: 16761917 Review.
-
Statistics of sequence-structure threading.Curr Opin Struct Biol. 1995 Apr;5(2):236-44. doi: 10.1016/0959-440x(95)80082-4. Curr Opin Struct Biol. 1995. PMID: 7648327 Review.
Cited by
-
No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects.Microb Biotechnol. 2018 Jul;11(4):588-605. doi: 10.1111/1751-7915.13284. Epub 2018 May 28. Microb Biotechnol. 2018. PMID: 29806194 Free PMC article. Review.
-
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?Malar J. 2006 Nov 17;5:110. doi: 10.1186/1475-2875-5-110. Malar J. 2006. PMID: 17112376 Free PMC article. Review.
-
Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges.Sci Rep. 2013 Oct 3;3:2837. doi: 10.1038/srep02837. Sci Rep. 2013. PMID: 24089188 Free PMC article.
-
A simple derivation of the distribution of pairwise local protein sequence alignment scores.Evol Bioinform Online. 2008 Feb 14;4:41-5. Evol Bioinform Online. 2008. PMID: 19204806 Free PMC article.
-
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.BMC Bioinformatics. 2008 Aug 7;9:332. doi: 10.1186/1471-2105-9-332. BMC Bioinformatics. 2008. PMID: 18687111 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Research Materials