Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 14;3(1):vbad091.
doi: 10.1093/bioadv/vbad091. eCollection 2023.

Measuring the relative contribution to predictive power of modern nucleotide substitution modeling approaches

Affiliations

Measuring the relative contribution to predictive power of modern nucleotide substitution modeling approaches

Thomas Bujaki et al. Bioinform Adv. .

Abstract

Traditional approaches to probabilistic phylogenetic inference have relied on information-theoretic criteria to select among a relatively small set of substitution models. These model selection criteria have recently been called into question when applied to richer models, including models that invoke mixtures of nucleotide frequency profiles. At the nucleotide level, we are therefore left without a clear picture of mixture models' contribution to overall predictive power relative to other modeling approaches. Here, we utilize a Bayesian cross-validation method to directly measure the predictive performance of a wide range of nucleotide substitution models. We compare the relative contributions of free nucleotide exchangeability parameters, gamma-distributed rates across sites, and mixtures of nucleotide frequencies with both finite and infinite mixture frameworks. We find that the most important contributor to a model's predictive power is the use of a sufficiently rich mixture of nucleotide frequencies. These results suggest that mixture models should be given greater consideration in nucleotide-level phylogenetic inference.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest.

Figures

Figure 1.
Figure 1.
Cross-validation scores (relative to the GTR + Γ model) for the Regier dataset, plotted as a function of the number of nucleotide frequency components; note that models consisting of special cases with a single component are labeled near x =1.
Figure 2.
Figure 2.
Cross-validation scores (relative to the GTR + Γ model) for the RBCL dataset, plotted as a function of the number of nucleotide frequency components; note that models consisting of special cases with a single component are labeled near x =1.

References

    1. Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr 1974;19:716–23.
    1. Blanquart S, Lartillot N.. A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. Mol Biol Evol 2006;23:2058–71. - PubMed
    1. Brewer MJ, Butler A, Cooksley SL.. The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity. Methods Ecol Evol 2016;7:679–92.
    1. Broman KW, Speed TP.. A model selection approach for the identification of quantitative trait loci in experimental crosses. J R Stat Soc B Stat Methodol 2002;64:641–56.
    1. Broughton R, Betancur-R R, Li C. et al. Multi-locus phylogenetic analysis reveals the pattern and tempo of bony fish evolution. PLoS Currents2013;5. doi: 10.1371/currents.tol.2ca8041495ffafd0c92756e75247483e. - DOI - PMC - PubMed