Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 27;363(1512):3931-9.
doi: 10.1098/rstb.2008.0167.

Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences

Affiliations

Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences

Sang Chul Choi et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Models of molecular evolution tend to be overly simplistic caricatures of biology that are prone to assigning high probabilities to biologically implausible DNA or protein sequences. Here, we explore how to construct time-reversible evolutionary models that yield stationary distributions of sequences that match given target distributions. By adopting comparatively realistic target distributions,evolutionary models can be improved. Instead of focusing on estimating parameters, we concentrate on the population genetic implications of these models. Specifically, we obtain estimates of the product of effective population size and relative fitness difference of alleles. The approach is illustrated with two applications to protein-coding DNA. In the first, a codon-based evolutionary model yields a stationary distribution of sequences, which, when the sequences are translated,matches a variable-length Markov model trained on human proteins. In the second, we introduce an insertion-deletion model that describes selectively neutral evolutionary changes to DNA. We then show how to modify the neutral model so that its stationary distribution at the amino acid level can match a profile hidden Markov model, such as the one associated with the Pfam database.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Estimates of 2Nsj from the VLMM trained by human protein sequences. (a) Distribution of 2Nsj estimates among possible non-synonymous changes in the human genome. (b) Distribution of the mean 2Nsj estimate per gene among human genes.
Figure 2
Figure 2
Estimates of 2Nsj for possible mutations to the human p53 gene. (a) Non-synonymous mutations, (b) single-codon deletions, (c) single-codon insertions, (d) deletions of 10 consecutive codons and (e) insertions of 10 consecutive codons.

Similar articles

Cited by

References

    1. Bejerano G. Algorithms for variable length Markov chain modeling. Bioinformatics. 2004;20:788–789. doi:10.1093/bioinformatics/btg489 - DOI - PubMed
    1. Bejerano G, Yona G. Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics. 2001;17:23–43. doi:10.1093/bioinformatics/17.1.23 - DOI - PubMed
    1. Berg J, Willmann S, Lässig M. Adaptive evolution of transcription factor binding sites. BMC Evol. Biol. 2004;4:42. doi:10.1186/1471-2148-4-42 - DOI - PMC - PubMed
    1. Blanquart S, Lartillot N. A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. Mol. Biol. Evol. 2006;23:2058–2071. doi:10.1093/molbev/msl091 - DOI - PubMed
    1. Blanquart S, Lartillot N. A site- and time-heterogeneous model of amino acid replacement. Mol. Biol. Evol. 2008;25:842–858. doi:10.1093/molbev/msn018 - DOI - PubMed

Publication types

LinkOut - more resources