Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 1989 Dec;29(6):526-37.
doi: 10.1007/BF02602924.

Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences

Affiliations
Comparative Study

Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences

B E Blaisdell. J Mol Evol. 1989 Dec.

Abstract

Various measures of sequence dissimilarity have been evaluated by how well the additive least squares estimation of edges (branch lengths) of an unrooted evolutionary tree fit the observed pairwise dissimilarity measures and by how consistent the trees are for different data sets derived from the same set of sequences. This evaluation provided sensitive discrimination among dissimilarity measures and among possible trees. Dissimilarity measures not requiring prior sequence alignment did about as well as did the traditional mismatch counts requiring prior sequence alignment. Application of Jukes-Cantor correction to singlet mismatch counts worsened the results. Measures not requiring alignment had the advantage of being applicable to sequences too different to be critically alignable. Two different measures of pairwise dissimilarity not requiring alignment have been used: (1) multiplet distribution distance (MDD), the square of the Euclidean distance between vectors of the fractions of base signlets (or doublets, or triplets, or ...) in the respective sequences, and (2) complements of long words (CLW), the count of bases not occurring in significantly long common words. MDD was applicable to sequences more different than was CLW (noncoding), but the latter often gave better results where both measures were available (coding). MDD results were improved by using longer mutliplets and, if the sequences were coding, by using the larger amino acid and codon alphabets rather than the nucleotide alphabet. The additive least squares method could be used to provide a reasonable consensus of different trees for the same set of species (or related genes).

PubMed Disclaimer

Similar articles

Cited by

References

    1. Mol Biol Evol. 1987 Jul;4(4):406-25 - PubMed
    1. Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9 - PubMed
    1. Cell. 1979 Nov;18(3):865-73 - PubMed
    1. Comput Appl Biosci. 1988 Mar;4(1):41-51 - PubMed
    1. Evolution. 1967 Sep;21(3):550-570 - PubMed

Publication types