Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Feb 8;7 Suppl 1(Suppl 1):S4.
doi: 10.1186/1471-2148-7-S1-S4.

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model

Affiliations
Comparative Study

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model

Nicolas Lartillot et al. BMC Evol Biol. .

Abstract

Background: Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions.

Methods: We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation.

Results: Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences.

Conclusion: The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Posterior majority-rule consensus trees obtained under WAG+F+Γ, for the Meta1 data set, using four different taxon configurations: ingroup includes 5 deuterostomes (grey) and 10 arthropods (green), as well as 10 nematodes (red, A and B) or 5 platyhelminths (orange, C and D). Outgroup taxa comprise 12 fungi (dark blue), alone (A and C), or together with 2 choanoflagellates and a cnidarian (light blue, B and D). Posterior probabilities are displayed only when strictly lower than 1.
Figure 2
Figure 2
Posterior consensus trees obtained under CAT+F+Γ. Taxon sampling and color-codes are as in figure 1.
Figure 3
Figure 3
Posterior predictive statistical tests: Maximum Parsimony (pars. arrow), posterior distribution (obs. solid lines), and posterior predictive distribution (pred. dashed lines) of two statistics, n, the mean number of substitutions per site (A and C), and h, the mean number of homoplasies per site (B and D), under the Coelomata (A, B) and the Protostomia (C, D) hypotheses. The dataset is Meta1, with nematodes, and fungi as the only outgroup.
Figure 4
Figure 4
Posterior predictive analysis of the mean number of distinct residues observed at each column of the alignement (mean diversity). The analysis was done on Meta1, using the nematode/fungi taxon configuration.
Figure 5
Figure 5
Average probability of return to the initial state, under WAG (dashed lines) and CAT (solid lines), as a function of the number of substitutions.

Similar articles

Cited by

References

    1. Philippe H, Delsuc F, Brinkmann H, Lartillot N. Phylogenomics. Annu Rev Ecol Evol Syst. 2005;36:541–562. doi: 10.1146/annurev.ecolsys.35.112202.130205. - DOI
    1. Kluge AG. A concern for evidence and a phylogenetic hypothesis if relationships among Epicrates (Boidae, Serpentes) Syst Zool. 1989;38:7–25. doi: 10.2307/2992432. - DOI
    1. Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003;425:798–804. doi: 10.1038/nature02053. - DOI - PubMed
    1. Goremykin VV, Hirsch-Ernst KI, S W, Hellwig FH. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Mol Biol Evol. 2003;20:1499–1505. doi: 10.1093/molbev/msg159. - DOI - PubMed
    1. Wolf YI, Rogozin IB, Koonin EV. Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis. Genome Res. 2004;14:29–36. doi: 10.1101/gr.1347404. - DOI - PMC - PubMed

Publication types

LinkOut - more resources