. 2008 Dec 27;363(1512):3931-9.

doi: 10.1098/rstb.2008.0167.

Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences

Sang Chul Choi¹, Benjamin D Redelings, Jeffrey L Thorne

Affiliations

PMID: 18852105
PMCID: PMC2607412
DOI: 10.1098/rstb.2008.0167

Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences

Sang Chul Choi et al. Philos Trans R Soc Lond B Biol Sci. 2008.

. 2008 Dec 27;363(1512):3931-9.

doi: 10.1098/rstb.2008.0167.

Authors

Sang Chul Choi¹, Benjamin D Redelings, Jeffrey L Thorne

Affiliation

¹ Bioinformatics Research Center, North Carolina State University, Box 7566, Raleigh, NC 27695-7566, USA.

PMID: 18852105
PMCID: PMC2607412
DOI: 10.1098/rstb.2008.0167

Abstract

Models of molecular evolution tend to be overly simplistic caricatures of biology that are prone to assigning high probabilities to biologically implausible DNA or protein sequences. Here, we explore how to construct time-reversible evolutionary models that yield stationary distributions of sequences that match given target distributions. By adopting comparatively realistic target distributions,evolutionary models can be improved. Instead of focusing on estimating parameters, we concentrate on the population genetic implications of these models. Specifically, we obtain estimates of the product of effective population size and relative fitness difference of alleles. The approach is illustrated with two applications to protein-coding DNA. In the first, a codon-based evolutionary model yields a stationary distribution of sequences, which, when the sequences are translated,matches a variable-length Markov model trained on human proteins. In the second, we introduce an insertion-deletion model that describes selectively neutral evolutionary changes to DNA. We then show how to modify the neutral model so that its stationary distribution at the amino acid level can match a profile hidden Markov model, such as the one associated with the Pfam database.

PubMed Disclaimer

Figures

**Figure 1**
Estimates of 2Ns_j from the VLMM trained by human protein sequences. (a) Distribution of 2Ns_j estimates among possible non-synonymous changes in the human genome. (b) Distribution of the mean 2Ns_j estimate per gene among human genes.

**Figure 2**
Estimates of 2Ns_j for possible mutations to the human p53 gene. (a) Non-synonymous mutations, (b) single-codon deletions, (c) single-codon insertions, (d) deletions of 10 consecutive codons and (e) insertions of 10 consecutive codons.

See this image and copyright information in PMC

Cited by

All-at-once RNA folding with 3D motif prediction framed by evolutionary information.
Karan A, Rivas E. Karan A, et al. bioRxiv [Preprint]. 2025 Apr 8:2024.12.17.628809. doi: 10.1101/2024.12.17.628809. bioRxiv. 2025. PMID: 39764046 Free PMC article. Preprint.
All-at-once RNA folding with 3D motif prediction framed by evolutionary information.
Karan A, Rivas E. Karan A, et al. Res Sq [Preprint]. 2025 Mar 26:rs.3.rs-5664139. doi: 10.21203/rs.3.rs-5664139/v1. Res Sq. 2025. PMID: 40195991 Free PMC article. Preprint.
Fast optimization of statistical potentials for structurally constrained phylogenetic models.
Bonnard C, Kleinman CL, Rodrigue N, Lartillot N. Bonnard C, et al. BMC Evol Biol. 2009 Sep 9;9:227. doi: 10.1186/1471-2148-9-227. BMC Evol Biol. 2009. PMID: 19740424 Free PMC article.
Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles.
Rodrigue N, Philippe H, Lartillot N. Rodrigue N, et al. Proc Natl Acad Sci U S A. 2010 Mar 9;107(10):4629-34. doi: 10.1073/pnas.0910915107. Epub 2010 Feb 22. Proc Natl Acad Sci U S A. 2010. PMID: 20176949 Free PMC article.
A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness.
Katsonis P, Lichtarge O. Katsonis P, et al. Genome Res. 2014 Dec;24(12):2050-8. doi: 10.1101/gr.176214.114. Epub 2014 Sep 12. Genome Res. 2014. PMID: 25217195 Free PMC article.

See all "Cited by" articles

References

1. Bejerano G. Algorithms for variable length Markov chain modeling. Bioinformatics. 2004;20:788–789. doi:10.1093/bioinformatics/btg489 - DOI - PubMed
1. Bejerano G, Yona G. Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics. 2001;17:23–43. doi:10.1093/bioinformatics/17.1.23 - DOI - PubMed
1. Berg J, Willmann S, Lässig M. Adaptive evolution of transcription factor binding sites. BMC Evol. Biol. 2004;4:42. doi:10.1186/1471-2148-4-42 - DOI - PMC - PubMed
1. Blanquart S, Lartillot N. A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. Mol. Biol. Evol. 2006;23:2058–2071. doi:10.1093/molbev/msl091 - DOI - PubMed
1. Blanquart S, Lartillot N. A site- and time-heterogeneous model of amino acid replacement. Mol. Biol. Evol. 2008;25:842–858. doi:10.1093/molbev/msn018 - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences

Affiliation

Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources