A gamma mixture model better accounts for among site rate heterogeneity
- PMID: 16204095
- DOI: 10.1093/bioinformatics/bti1125
A gamma mixture model better accounts for among site rate heterogeneity
Abstract
Motivation: Variation of substitution rates across nucleotide and amino acid sites has long been recognized as a characteristic of molecular sequence evolution. Evolutionary models that account for this rate heterogeneity usually use a gamma density function to model the rate distribution across sites. This density function, however, may not fit real datasets, especially when there is a multimodal distribution of rates. Here, we present a novel evolutionary model based on a mixture of gamma density functions. This model better describes the among-site rate variation characteristic of molecular sequence evolution. The use of this model may improve the accuracy of various phylogenetic methods, such as reconstructing phylogenetic trees, dating divergence events, inferring ancestral sequences and detecting conserved sites in proteins.
Results: Using diverse sets of protein sequences we show that the gamma mixture model better describes the stochastic process underlying protein evolution. We show that the proposed gamma mixture model fits protein datasets significantly better than the single-gamma model in 9 out of 10 datasets tested. We further show that using the gamma mixture model improves the accuracy of model-based prediction of conserved residues in proteins.
Availability: C++ source codes are available from the authors upon request.
Similar articles
-
Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic factorial hidden Markov models.Bioinformatics. 2005 Sep 1;21 Suppl 2:ii166-72. doi: 10.1093/bioinformatics/bti1127. Bioinformatics. 2005. PMID: 16204097
-
Computing recombination networks from binary sequences.Bioinformatics. 2005 Sep 1;21 Suppl 2:ii159-65. doi: 10.1093/bioinformatics/bti1126. Bioinformatics. 2005. PMID: 16204096
-
Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates.Bioinformatics. 2007 Jul 1;23(13):i319-27. doi: 10.1093/bioinformatics/btm176. Bioinformatics. 2007. PMID: 17646313
-
Phylogenetics by likelihood: evolutionary modeling as a tool for understanding the genome.J Biomed Inform. 2006 Feb;39(1):51-61. doi: 10.1016/j.jbi.2005.08.003. Epub 2005 Sep 15. J Biomed Inform. 2006. PMID: 16226061 Review.
-
Homology assessment and molecular sequence alignment.J Biomed Inform. 2006 Feb;39(1):18-33. doi: 10.1016/j.jbi.2005.11.005. Epub 2005 Dec 9. J Biomed Inform. 2006. PMID: 16380300 Review.
Cited by
-
Taxonomic revision of the Malagasy Aphaenogaster swammerdami group (Hymenoptera: Formicidae).PeerJ. 2021 Mar 2;9:e10900. doi: 10.7717/peerj.10900. eCollection 2021. PeerJ. 2021. PMID: 33717685 Free PMC article.
-
Molecular phylogeny reveals the non-monophyly of tribe Yinshanieae (Brassicaceae) and description of a new tribe, Hillielleae.Plant Divers. 2016 Sep 5;38(4):171-182. doi: 10.1016/j.pld.2016.04.004. eCollection 2016 Aug. Plant Divers. 2016. PMID: 30159462 Free PMC article.
-
Efficient inference, potential, and limitations of site-specific substitution models.Virus Evol. 2020 Aug 20;6(2):veaa066. doi: 10.1093/ve/veaa066. eCollection 2020 Jul. Virus Evol. 2020. PMID: 33343922 Free PMC article.
-
Probabilistic graphical model representation in phylogenetics.Syst Biol. 2014 Sep;63(5):753-71. doi: 10.1093/sysbio/syu039. Epub 2014 Jun 20. Syst Biol. 2014. PMID: 24951559 Free PMC article.
-
Acanthamoeba of three morphological groups and distinct genotypes exhibit variable and weakly inter-related physiological properties.Parasitol Res. 2018 May;117(5):1389-1400. doi: 10.1007/s00436-018-5824-8. Epub 2018 Mar 12. Parasitol Res. 2018. PMID: 29532218
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources