Alignment statistic for identifying related protein sequences
- PMID: 864719
- DOI: 10.1007/BF01732744
Alignment statistic for identifying related protein sequences
Abstract
Closely related proteins show an obvious kinship by having numerous matching amino acids in their aligned sequences. Kinship between anciently separated proteins requires a statistical evaluation to rule out fortuitous similarities. A simple statistic is developed which assumes equal probability for all codon pairs, and a table of critical values for amino acid sequence alignments of lengthnments of length 200 or less is presented. Applying this statistic to V and C regions of immunoglobulin chains, aligned on the basis of shared features of three-dimensional structure, provides evidence that the V and C sequences descended from a common ancestor. Similarly the distant evolutionary relationship of dehydrogenases, flavdoxin, and subtilisin, suggested by structural alignments, is verified. On the other hand, the statistic does not verify a common evolutionary origin for the heme binding pocket in globins and cytochrome bs. Empirical evidence from the distribution of MMD values of amino acid pairs in comparisons of misaligned polypeptide chains and from Monte Carlo trials of sequences aligned with arbitrary gaps supports the validity of the statistic.