The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference
- PMID: 20525573
- PMCID: PMC7539334
- DOI: 10.1093/sysbio/syp017
The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference
Abstract
Although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phylogenetic accuracy is not well understood. We use 4-taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ML) and Bayesian frameworks. By introducing ambiguous data in a way that removes confounding factors, we provide the first clear understanding of 1 mechanism by which ambiguous data can mislead phylogenetic analyses. We find that in both ML and Bayesian frameworks, among-site rate variation can interact with ambiguous data to produce misleading estimates of topology and branch lengths. Furthermore, within a Bayesian framework, priors on branch lengths and rate heterogeneity parameters can exacerbate the effects of ambiguous data, resulting in strongly misleading bipartition posterior probabilities. The magnitude and direction of the ambiguous data bias are a function of the number and taxonomic distribution of ambiguous characters, the strength of topological support, and whether or not the model is correctly specified. The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis.
Figures








Similar articles
-
Branch length estimation and divergence dating: estimates of error in Bayesian and maximum likelihood frameworks.BMC Evol Biol. 2010 Jan 11;10:5. doi: 10.1186/1471-2148-10-5. BMC Evol Biol. 2010. PMID: 20064267 Free PMC article.
-
Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation.BMC Evol Biol. 2005 Jan 28;5:8. doi: 10.1186/1471-2148-5-8. BMC Evol Biol. 2005. PMID: 15676079 Free PMC article.
-
The devil in the details: interactions between the branch-length prior and likelihood model affect node support and branch lengths in the phylogeny of the Psoraceae.Syst Biol. 2011 Jul;60(4):541-61. doi: 10.1093/sysbio/syr022. Epub 2011 Mar 24. Syst Biol. 2011. PMID: 21436107
-
Using models of nucleotide evolution to build phylogenetic trees.Dev Comp Immunol. 2005;29(3):211-27. doi: 10.1016/j.dci.2004.07.007. Dev Comp Immunol. 2005. PMID: 15572070 Review.
-
Bayesian tests of topology hypotheses with an example from diving beetles.Syst Biol. 2013 Sep;62(5):660-73. doi: 10.1093/sysbio/syt029. Epub 2013 Apr 28. Syst Biol. 2013. PMID: 23628960 Free PMC article. Review.
Cited by
-
The Chloroplast Land Plant Phylogeny: Analyses Employing Better-Fitting Tree- and Site-Heterogeneous Composition Models.Front Plant Sci. 2020 Jul 10;11:1062. doi: 10.3389/fpls.2020.01062. eCollection 2020. Front Plant Sci. 2020. PMID: 32760416 Free PMC article.
-
An extreme case of plant-insect codiversification: figs and fig-pollinating wasps.Syst Biol. 2012 Dec 1;61(6):1029-47. doi: 10.1093/sysbio/sys068. Epub 2012 Jul 30. Syst Biol. 2012. PMID: 22848088 Free PMC article.
-
Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon sampling and model choice.Genome Biol Evol. 2013;5(4):681-7. doi: 10.1093/gbe/evt032. Genome Biol Evol. 2013. PMID: 23471508 Free PMC article.
-
Insect phylogenomics: exploring the source of incongruence using new transcriptomic data.Genome Biol Evol. 2012;4(12):1295-309. doi: 10.1093/gbe/evs104. Genome Biol Evol. 2012. PMID: 23175716 Free PMC article.
-
Structural and Evolutionary Adaptations of Nei-Like DNA Glycosylases Proteins Involved in Base Excision Repair of Oxidative DNA Damage in Vertebrates.Oxid Med Cell Longev. 2022 Apr 4;2022:1144387. doi: 10.1155/2022/1144387. eCollection 2022. Oxid Med Cell Longev. 2022. PMID: 35419164 Free PMC article.
References
-
- Armbruster WS. Phylogeny and the evolution of plant-animal interactions. BioScience. 1992;42:12–20.
-
- Avise J. Evolutionary pathways in nature: a phylogenetic approach. New York: Cambridge University Press; 2006. pp. 1–298.
-
- Bowers JE, Chapman BA, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–438. - PubMed
-
- Brown JM, Lemmon AR. The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Syst. Biol. 2007;56:643–655. - PubMed
-
- Bull JJ, Cunningham CW, Molineux IJ, Badgett MR, Hillis DM. Experimental molecular evolution of bacteriophage T7. Evolution. 1993;47:993–1007. - PubMed