Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences
- PMID: 2515299
- DOI: 10.1007/BF02602924
Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences
Abstract
Various measures of sequence dissimilarity have been evaluated by how well the additive least squares estimation of edges (branch lengths) of an unrooted evolutionary tree fit the observed pairwise dissimilarity measures and by how consistent the trees are for different data sets derived from the same set of sequences. This evaluation provided sensitive discrimination among dissimilarity measures and among possible trees. Dissimilarity measures not requiring prior sequence alignment did about as well as did the traditional mismatch counts requiring prior sequence alignment. Application of Jukes-Cantor correction to singlet mismatch counts worsened the results. Measures not requiring alignment had the advantage of being applicable to sequences too different to be critically alignable. Two different measures of pairwise dissimilarity not requiring alignment have been used: (1) multiplet distribution distance (MDD), the square of the Euclidean distance between vectors of the fractions of base signlets (or doublets, or triplets, or ...) in the respective sequences, and (2) complements of long words (CLW), the count of bases not occurring in significantly long common words. MDD was applicable to sequences more different than was CLW (noncoding), but the latter often gave better results where both measures were available (coding). MDD results were improved by using longer mutliplets and, if the sequences were coding, by using the larger amino acid and codon alphabets rather than the nucleotide alphabet. The additive least squares method could be used to provide a reasonable consensus of different trees for the same set of species (or related genes).
Similar articles
-
Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system.J Mol Evol. 1989 Dec;29(6):538-47. doi: 10.1007/BF02602925. J Mol Evol. 1989. PMID: 2515300
-
Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a variety of computer-generated model systems.J Mol Evol. 1991 Jun;32(6):521-8. doi: 10.1007/BF02102654. J Mol Evol. 1991. PMID: 1908023
-
Reconstructing evolution from eukaryotic small-ribosomal-subunit RNA sequences: calibration of the molecular clock.J Mol Evol. 1993 Aug;37(2):221-32. doi: 10.1007/BF02407359. J Mol Evol. 1993. PMID: 8411212
-
A measure of the similarity of sets of sequences not requiring sequence alignment.Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155-9. doi: 10.1073/pnas.83.14.5155. Proc Natl Acad Sci U S A. 1986. PMID: 3460087 Free PMC article.
-
Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters.Mol Biol Evol. 1993 Jan;10(1):73-102. doi: 10.1093/oxfordjournals.molbev.a039991. Mol Biol Evol. 1993. PMID: 8383794 Review.
Cited by
-
Phylogenetic continuum indicates "galaxies" in the protein universe: preliminary results on the natural group structures of proteins.J Mol Evol. 1992 Apr;34(4):358-75. doi: 10.1007/BF00160244. J Mol Evol. 1992. PMID: 1569589
-
n-Gram characterization of genomic islands in bacterial genomes.Comput Methods Programs Biomed. 2009 Mar;93(3):241-56. doi: 10.1016/j.cmpb.2008.10.014. Epub 2008 Dec 19. Comput Methods Programs Biomed. 2009. PMID: 19101056 Free PMC article.
-
Metagenomic Classification Using an Abstraction Augmented Markov Model.J Comput Biol. 2016 Feb;23(2):111-122. doi: 10.1089/cmb.2015.0141. Epub 2015 Nov 30. J Comput Biol. 2016. PMID: 26618474 Free PMC article.
-
Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system.J Mol Evol. 1989 Dec;29(6):538-47. doi: 10.1007/BF02602925. J Mol Evol. 1989. PMID: 2515300
-
An accurate alignment-free protein sequence comparator based on physicochemical properties of amino acids.Sci Rep. 2022 Jul 1;12(1):11158. doi: 10.1038/s41598-022-15266-8. Sci Rep. 2022. PMID: 35778592 Free PMC article.