Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec;7(12):e1002395.
doi: 10.1371/journal.pgen.1002395. Epub 2011 Dec 1.

A population genetics-phylogenetics approach to inferring natural selection in coding sequences

Affiliations

A population genetics-phylogenetics approach to inferring natural selection in coding sequences

Daniel J Wilson et al. PLoS Genet. 2011 Dec.

Abstract

Through an analysis of polymorphism within and divergence between species, we can hope to learn about the distribution of selective effects of mutations in the genome, changes in the fitness landscape that occur over time, and the location of sites involved in key adaptations that distinguish modern-day species. We introduce a novel method for the analysis of variation in selection pressures within and between species, spatially along the genome and temporally between lineages. We model codon evolution explicitly using a joint population genetics-phylogenetics approach that we developed for the construction of multiallelic models with mutation, selection, and drift. Our approach has the advantage of performing direct inference on coding sequences, inferring ancestral states probabilistically, utilizing allele frequency information, and generalizing to multiple species. We use a Bayesian sliding window model for intragenic variation in selection coefficients that efficiently combines information across sites and captures spatial clustering within the genome. To demonstrate the utility of the method, we infer selective pressures acting in Drosophila melanogaster and D. simulans from polymorphism and divergence data for 100 X-linked coding regions.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The distribution of fitness effects.
The distribution of fitness effects of (A) new non-synonymous mutations and (B) amino acid substitutions in D. melanogaster (left bars) and D. simulans (right bars). The height of the bar represents the estimated frequency of each selection coefficient aggregated across codons, with the 95% credible interval indicated by a vertical line. In (A) and (B) the bars are colored according to their selection coefficient, with colors closer to red representing increasingly deleterious variants, white representing neutral variants, and colors closer to blue representing increasingly beneficial variants.
Figure 2
Figure 2. The frequency of amino acid substitutions attributable to positive selection in the D. melanogaster lineage (left bars) and the D. simulans lineage (right bars).
A+: beneficial substitutions (γ>0) attributable to selection. D+: beneficial substitutions (γ>0) attributable to drift. D0: neutral substitutions (γ = 0) attributable to drift. D–: deleterious substitutions (γ<0) attributable to drift.
Figure 3
Figure 3. The posterior probability of positive selection across genes and codons.
(A) The number of non-synonymous substitutions (DN) and polymorphisms (PN) and synonymous substitutions (DS) and polymorphisms (PS) per gene in the D. melanogaster and D. simulans lineages. (B) The rank per gene of various measures of selection. formula image: mean selection coefficient at viable sites. formula image: proportion of sites viable. formula image: odds ratio of the McDonald-Kreitman table. (C) formula image, the posterior probability of positive selection per codon (points) and per gene (black line). Points are colored randomly to aid visualization. In (A), (B) and (C), genes are ordered horizontally by the rank of formula image per gene.
Figure 4
Figure 4. Evidence for positive selection in three genes.
At each codon, the posterior probability of positive selection is plotted for D. melanogaster (dark grey line) and D. simulans (light grey line). To illustrate the signal in the data, the figure is superimposed with the sample frequency of polymorphisms in the two species (vertical bars) and substitutions along the two lineages (filled circles, above). The colors indicate synonymous variants in D. melanogaster (dark green) and D. simulans (light green) and non-synonymous variants in D. melanogaster (red) and D. simulans (orange).
Figure 5
Figure 5. Spatial correlation in selection coefficients.
Spacial correlation in selection coefficients in (A) D. melanogaster and (B) D. simulans. The correlation in the posterior probability of each selection coefficient is shown, calculated for all pairs of sites separated by the specified distance (circles). A smoothed estimate of the autocorrelation function has been superimposed (lines). The values of the selection coefficients are indicated by the coloring, which is the same as for Figure 1.

Similar articles

Cited by

References

    1. Tinbergen N. On aims and methods of ethology. Zeitschrift für Tierpsychologie. 1963;20:410–433.
    1. Gould SJ, Lewontin RC. The spandrels of San Marco and the Panglossian paradigm. Proc Roy Soc Lond B. 1979;205:581–598. - PubMed
    1. Kimura M. The Neutral Theory of Molecular Evolution. 1983. Cambridge University Press, Cambridge. - PubMed
    1. Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nat Rev Genet. 2007;8:610–618. - PubMed
    1. Sella G, Petrov D, Przeworski M, Andolfatto P. Pervasive natural selection in the Drosophila genome? PLoS Genet. 2009;5:e1000495. doi: 10.1371/journal.pgen.1000495. - DOI - PMC - PubMed

Publication types