Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Dec 5:7:529.
doi: 10.1186/1471-2105-7-529.

Fast estimation of the difference between two PAM/JTT evolutionary distances in triplets of homologous sequences

Affiliations

Fast estimation of the difference between two PAM/JTT evolutionary distances in triplets of homologous sequences

Christophe Dessimoz et al. BMC Bioinformatics. .

Abstract

Background: The estimation of the difference between two evolutionary distances within a triplet of homologs is a common operation that is used for example to determine which of two sequences is closer to a third one. The most accurate method is currently maximum likelihood over the entire triplet. However, this approach is relatively time consuming.

Results: We show that an alternative estimator, based on pairwise estimates and therefore much faster to compute, has almost the same statistical power as the maximum likelihood estimator. We also provide a numerical approximation for its variance, which could otherwise only be estimated through an expensive re-sampling approach such as bootstrapping. An extensive simulation demonstrates that the approximation delivers precise confidence intervals. To illustrate the possible applications of these results, we show how they improve the detection of asymmetric evolution, and the identification of the closest relative to a given sequence in a group of homologs.

Conclusion: The results presented in this paper constitute a basis for large-scale protein cross-comparisons of pairwise evolutionary distances.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Unrooted tree topology of all triplets of homologs. Sequences X, Y and Z originating from O. The problem addressed here is the estimation of the difference Δ = dXY - dXZ = dOY - dOZ
Figure 2
Figure 2
Scatter plots comparing the variance estimators. The upper-left plot shows the strong agreement between σ2(Δ^triplet) and our approximation σ2(Δ^pairwise). From the upper-right and the lower-left plots, it can be seen that both have similar correlation with σbootstrap2(Δ^pairwise). Finally, the lower-right plot confirms that variance estimation under the assumption of independence can yield a large overestimation of the correct variance.
Figure 3
Figure 3
Detection of asymmetric evolution. Detection of Asymmetric Evolution. Comparison between the results of Kellis et al. and the three variants of closer, with k = 1.96. The circles separate cases of significant asymmetry (inside) from insignificant asymmetry (outside). For instance, there were 92 cases where all three variants of closer reported significant asymmetry, while the method of Kellis et al. did not detect significant asymmetry.
Figure 4
Figure 4
Tree randomly generated for closest homolog simulation. Example of a random tree (see text for description of the procedure) used to compare the different methods to infer the closest homolog to each leaf. Distances indicated are in PAM units.
Figure 5
Figure 5
Identification of the closest homolog. Identification of the closest homolog: comparison between methods using alignment score (1), distance with assumption of independence (2) and distance using our variance approximation (3), on simulated data.

Similar articles

Cited by

References

    1. Swofford DL, Olsen GL, Waddell PJ, Hillis DM. Phylogenetic inference. 2. Sunderland, Massachusetts: Sinauer Associates; 1996. pp. 407–514.
    1. Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. University of Washington. Seattle., Department of Genome Sciences; 2004.
    1. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004. pp. 277–280. - DOI - PMC - PubMed
    1. Dessimoz C, Cannarozzi G, Gil M, Margadant D, Roth A, Schneider A, Gonnet G. In: RECOMB 2005 Workshop on Comparative Genomics, Volume LNBI 3678 of Lecture Notes in Bioinformatics. McLysath A, Huson DH, editor. Springer-Verlag; 2005. OMA, A Comprehensive, Automated Project for the Identification of Orthologs from Complete Genome Data: Introduction and First Achievements; pp. 61–72.
    1. DeLuca TF, Wu IH, Pu J, Monaghan T, Peshkin L, Singh S, Wall DP. Roundup: a multi-genome repository of orthologs and evolutionary distances. Bioinformatics. 2006;22(16):2044–2046. doi: 10.1093/bioinformatics/btl286. - DOI - PubMed

LinkOut - more resources