Quantifying the impact of dependent evolution among sites in phylogenetic inference
- PMID: 21081481
- PMCID: PMC2997629
- DOI: 10.1093/sysbio/syq074
Quantifying the impact of dependent evolution among sites in phylogenetic inference
Abstract
Nearly all commonly used methods of phylogenetic inference assume that characters in an alignment evolve independently of one another. This assumption is attractive for simplicity and computational tractability but is not biologically reasonable for RNAs and proteins that have secondary and tertiary structures. Here, we simulate RNA and protein-coding DNA sequence data under a general model of dependence in order to assess the robustness of traditional methods of phylogenetic inference to violation of the assumption of independence among sites. We find that the accuracy of independence-assuming methods is reduced by the dependence among sites; for proteins this reduction is relatively mild, but for RNA this reduction may be substantial. We introduce the concept of effective sequence length and its utility for considering information content in phylogenetics.
Figures
References
-
- Anisimova M, Kosiol C. Investigating protein-coding sequence evolution with probabilistic substitution models. Mol. Biol. Evol. 2009;26:255–271. - PubMed
-
- Bastolla U, Farwer J, Knapp EW, Vendruscolo M. How to guarantee optimal stability for most representative structures in the protein data bank. Proteins. 2001;44:79–96. - PubMed
-
- Choi SC, Hobolth A, Robinson DM, Kishino H, Thorne JL. Quantifying the impact of protein tertiary structure on molecular evolution. Mol. Biol. Evol. 2007;24:1769–1782. - PubMed
-
- Felsenstein J. Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 1978;27:401–411.
