A simple derivation of the distribution of pairwise local protein sequence alignment scores
- PMID: 19204806
- PMCID: PMC2614193
A simple derivation of the distribution of pairwise local protein sequence alignment scores
Abstract
Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. In the asymptotic limit of long sequences, the Karlin-Altschul model computes a P-value assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Using a simple approach combined with recent results in reliability theory, we demonstrate here that the Karlin-Altshul model can be derived with no reference to the extreme events theory.Sequences were considered as systems in which components are amino acids and having a high redundancy of Information reflected by their alignment scores. Evolution of the information shared between aligned components determined the Shared Amount of Information (SA.I.) between sequences, i.e. the score. The Gumbel distribution parameters of aligned sequences scores find here some theoretical rationale. The first is the Hazard Rate of the distribution of scores between residues and the second is the probability that two aligned residues do not lose bits of information (i.e. conserve an initial pairing score) when a mutation occurs.
Keywords: Karlin-Altshul theorem; conservation function; reliability theory.
Similar articles
-
Where does the alignment score distribution shape come from?Evol Bioinform Online. 2010 Dec 12;6:159-87. doi: 10.4137/EBO.S5875. Evol Bioinform Online. 2010. PMID: 21258650 Free PMC article.
-
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.BMC Bioinformatics. 2008 Aug 7;9:332. doi: 10.1186/1471-2105-9-332. BMC Bioinformatics. 2008. PMID: 18687111 Free PMC article.
-
Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics.Bioinformatics. 2004 Mar 1;20(4):534-7. doi: 10.1093/bioinformatics/btg440. Epub 2004 Jan 22. Bioinformatics. 2004. PMID: 14990449
-
Score distributions of gapped multiple sequence alignments down to the low-probability tail.Phys Rev E. 2016 Aug;94(2-1):022127. doi: 10.1103/PhysRevE.94.022127. Epub 2016 Aug 19. Phys Rev E. 2016. PMID: 27627266
-
Toward an accurate statistics of gapped alignments.Bull Math Biol. 2005 Jan;67(1):169-91. doi: 10.1016/j.bulm.2004.07.001. Bull Math Biol. 2005. PMID: 15691544
Cited by
-
Where does the alignment score distribution shape come from?Evol Bioinform Online. 2010 Dec 12;6:159-87. doi: 10.4137/EBO.S5875. Evol Bioinform Online. 2010. PMID: 21258650 Free PMC article.
-
Island method for estimating the statistical significance of profile-profile alignment scores.BMC Bioinformatics. 2009 Apr 20;10:112. doi: 10.1186/1471-2105-10-112. BMC Bioinformatics. 2009. PMID: 19379500 Free PMC article.
-
How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling.Evol Bioinform Online. 2011;7:61-85. doi: 10.4137/EBO.S7048. Epub 2011 Jun 7. Evol Bioinform Online. 2011. PMID: 21697992 Free PMC article.
References
-
- Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. - PubMed
-
- Aude JC, Louis A. An incremental algorithm for Z-value computations. Comput Chem. 2002;26:403–11. - PubMed
-
- Bacro JN, Comet JP. Sequence alignment: an approximation law for the Z-value with applications to databank scanning. Comput Chem. 2001;25:401–10. - PubMed
-
- Bastien O, Aude JC, Roy S, et al. Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics. Bioinformatics. 2004;20:534–7. - PubMed
LinkOut - more resources
Full Text Sources
Research Materials