Statistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein-coding sequences
- PMID: 20525578
- DOI: 10.1093/sysbio/syp015
Statistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein-coding sequences
Abstract
Statistical models for the evolution of molecular sequences play an important role in the study of evolutionary processes. For the evolutionary analysis of protein-coding sequences, 3 types of evolutionary models are available: 1) nucleotide, 2) amino acid, and 3) codon substitution models. Selecting appropriate models can greatly improve the estimation of phylogenies and divergence times and the detection of positive selection. Although much attention has been paid to the comparisons among the same types of models, relatively little attention has been paid to the comparisons among the different types of models. Additionally, because such models have different data structures, comparison of those models using conventional model selection criteria such as Akaike information criterion (AIC) or Bayesian information criterion (BIC) is not straightforward. Here, we suggest new procedures to convert models of the above-mentioned 3 types to 64-dimensional models with nucleotide triplet substitution. These conversion procedures render it possible to statistically compare the models of these 3 types by using AIC or BIC. By analyzing divergent and conserved interspecific mammalian sequences and intraspecific human population data, we show the superiority of the codon substitution models and discuss the advantages and disadvantages of the models of the 3 types.
Similar articles
-
Synonymous substitutions substantially improve evolutionary inference from highly diverged proteins.Syst Biol. 2008 Jun;57(3):367-77. doi: 10.1080/10635150802158670. Syst Biol. 2008. PMID: 18570032
-
The effect of branch length variation on the selection of models of molecular evolution.J Mol Evol. 2001 May;52(5):434-44. doi: 10.1007/s002390010173. J Mol Evol. 2001. PMID: 11443347
-
Modelling the evolution of protein coding sequences sampled from Measurably Evolving Populations.Genome Inform. 2008;21:150-64. Genome Inform. 2008. PMID: 19425155
-
Phase-dependent nucleotide substitution in protein-coding sequences.Biochem Biophys Res Commun. 2007 Apr 13;355(3):599-602. doi: 10.1016/j.bbrc.2007.01.006. Epub 2007 Jan 10. Biochem Biophys Res Commun. 2007. PMID: 17300744 Review.
-
Inference of viral evolutionary rates from molecular sequences.Adv Parasitol. 2003;54:331-58. doi: 10.1016/s0065-308x(03)54008-8. Adv Parasitol. 2003. PMID: 14711090 Review.
Cited by
-
Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences.Nature. 2010 Feb 25;463(7284):1079-83. doi: 10.1038/nature08742. Epub 2010 Feb 10. Nature. 2010. PMID: 20147900
-
Assessing the state of substitution models describing noncoding RNA evolution.Genome Biol Evol. 2014 Jan;6(1):65-75. doi: 10.1093/gbe/evt206. Genome Biol Evol. 2014. PMID: 24391153 Free PMC article.
-
Estimating empirical codon hidden Markov models.Mol Biol Evol. 2013 Mar;30(3):725-36. doi: 10.1093/molbev/mss266. Epub 2012 Nov 27. Mol Biol Evol. 2013. PMID: 23188590 Free PMC article.
-
Big data analysis of human mitochondrial DNA substitution models: a regression approach.BMC Genomics. 2018 Oct 19;19(1):759. doi: 10.1186/s12864-018-5123-x. BMC Genomics. 2018. PMID: 30340456 Free PMC article.
-
Ancestral Sequence Reconstruction for Exploring Alkaloid Evolution.Methods Mol Biol. 2022;2505:165-179. doi: 10.1007/978-1-0716-2349-7_12. Methods Mol Biol. 2022. PMID: 35732944
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources