Measuring the relative contribution to predictive power of modern nucleotide substitution modeling approaches
- PMID: 37502274
- PMCID: PMC10371494
- DOI: 10.1093/bioadv/vbad091
Measuring the relative contribution to predictive power of modern nucleotide substitution modeling approaches
Abstract
Traditional approaches to probabilistic phylogenetic inference have relied on information-theoretic criteria to select among a relatively small set of substitution models. These model selection criteria have recently been called into question when applied to richer models, including models that invoke mixtures of nucleotide frequency profiles. At the nucleotide level, we are therefore left without a clear picture of mixture models' contribution to overall predictive power relative to other modeling approaches. Here, we utilize a Bayesian cross-validation method to directly measure the predictive performance of a wide range of nucleotide substitution models. We compare the relative contributions of free nucleotide exchangeability parameters, gamma-distributed rates across sites, and mixtures of nucleotide frequencies with both finite and infinite mixture frameworks. We find that the most important contributor to a model's predictive power is the use of a sufficiently rich mixture of nucleotide frequencies. These results suggest that mixture models should be given greater consideration in nucleotide-level phylogenetic inference.
© The Author(s) 2023. Published by Oxford University Press.
Conflict of interest statement
The authors declare that they have no conflicts of interest.
Figures


References
-
- Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr 1974;19:716–23.
-
- Blanquart S, Lartillot N.. A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. Mol Biol Evol 2006;23:2058–71. - PubMed
-
- Brewer MJ, Butler A, Cooksley SL.. The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity. Methods Ecol Evol 2016;7:679–92.
-
- Broman KW, Speed TP.. A model selection approach for the identification of quantitative trait loci in experimental crosses. J R Stat Soc B Stat Methodol 2002;64:641–56.
LinkOut - more resources
Full Text Sources