Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 6;6(5):e1000767.
doi: 10.1371/journal.pcbi.1000767.

Mutation bias favors protein folding stability in the evolution of small populations

Affiliations

Mutation bias favors protein folding stability in the evolution of small populations

Raul Mendez et al. PLoS Comput Biol. .

Abstract

Mutation bias in prokaryotes varies from extreme adenine and thymine (AT) in obligatory endosymbiotic or parasitic bacteria to extreme guanine and cytosine (GC), for instance in actinobacteria. GC mutation bias deeply influences the folding stability of proteins, making proteins on the average less hydrophobic and therefore less stable with respect to unfolding but also less susceptible to misfolding and aggregation. We study a model where proteins evolve subject to selection for folding stability under given mutation bias, population size, and neutrality. We find a non-neutral regime where, for any given population size, there is an optimal mutation bias that maximizes fitness. Interestingly, this optimal GC usage is small for small populations, large for intermediate populations and around 50% for large populations. This result is robust with respect to the definition of the fitness function and to the protein structures studied. Our model suggests that small populations evolving with small GC usage eventually accumulate a significant selective advantage over populations evolving without this bias. This provides a possible explanation to the observation that most species adopting obligatory intracellular lifestyles with a consequent reduction of effective population size shifted their mutation spectrum towards AT. The model also predicts that large GC usage is optimal for intermediate population size. To test these predictions we estimated the effective population sizes of bacterial species using the optimal codon usage coefficients computed by dos Reis et al. and the synonymous to non-synonymous substitution ratio computed by Daubin and Moran. We found that the population sizes estimated in these ways are significantly smaller for species with small and large GC usage compared to species with no bias, which supports our prediction.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Fitness versus stabilities for (top) and (bottom).
Figure 2
Figure 2. Mean unfolding stability versus misfolding stability for neutrality exponent (non-neutral regime).
The sets of points joined with solid lines correspond to constant GC usage, between formula image (largest formula image) and formula image (largest formula image). formula image grows and formula image decreases with formula image. The sets of points joined with dashed lines correspond to constant population size formula image, from formula image (smallest stability) to formula image (largest stability). Both stability variables formula image increase with formula image. Data points are superimposed to a heat map of the fitness function, showing that fitness increases with formula image. However, constant formula image lines do not correspond to constant fitness, but there are small variations, from which the optimal GC usage is derived. The solid white line shows formula image at which the selective pressures on formula image and formula image balance. One can see that, at large formula image, formula image is smaller than formula image for all formula image, so that the selective pressure is stronger on the former.
Figure 3
Figure 3. Fitness (in different units for each curve) versus GC usage for neutrality exponent and three different population sizes.
The curves have been shifted in the vertical direction so that their maxima coincide. We obtain formula image by cubic fits, which are plotted as dotted, dashed, and solid lines.
Figure 4
Figure 4. Optimal GC usage at which the fitness is maximum versus population size .
The upper plot shows data with neutrality exponent formula image and the bottom plot shows formula image and 20. Interpolating lines are drawn as a guide to the eye.
Figure 5
Figure 5. Optimal mutation bias at which the fitness is maximum versus population size for different proteins and neutrality exponent .
Upper plot: Results for individual proteins. Bottom plot: Fitness is obtained for the combination of 5 proteins either as the minimum or as the product over all proteins. Interpolating lines are drawn as a guide to the eye.
Figure 6
Figure 6. Optimal GC usage versus population size for neutrality exponent and different values of the neutral thresholds and , where the reference energy gap and unfolding free energy are those measured for the protein in the PDB.
We simulated all nine combinations of the values formula image for either formula image of formula image. We only show four combinations since all other curves are contained between them.
Figure 7
Figure 7. Comparison between the optimal GC usages computed with GKS energy parameters (dotted line and dashed line) and the BVK parameters adopted in the present study (solid line).
The conformation entropy is formula image for BVK parameters and formula image for GKS. The coefficient of the neutral threshold is formula image for the dotted curve and formula image for the dashed curve. Other parameters are fixed at formula image, formula image.
Figure 8
Figure 8. Optimal GC usage versus neutrality exponent for three population sizes .
Figure 9
Figure 9. Estimates of quantities correlating with effective population size obtained from genomic data.
Upper plot: Optimal codon bias estimated by dos Reis et al. versus GC content at synonymous third codon position, shown as mean and standard error of the mean for three bins of GC3 (smaller than 30%, 40 to 60%, larger than 70%). Error bars in the plot represent the standard error of the mean, and show that the mean values are significantly different. However, data prior to the mean are rather broadly distributed, with standard deviations equal to formula image (formula image, formula image (formula image) and formula image (formula image). Bottom plot: values of formula image computed by Daubin and Moran are averaged for pairs of bacteria with low, intermediate and high GC content. Both plots support the notion that species with GC content formula image are characterized by larger effective population size.
Figure 10
Figure 10. Negative correlation between misfolding and unfolding stability.
Upper plot: Simulation results for average misfolding stability formula image versus unfolding stability formula image for various mutation biases, three population sizes and neutrality exponent formula image (non-neutral regime) and formula image (neutral regime). Bottom plot: Estimated misfolding versus unfolding stability for families of homologous proteins in prokaryotic genomes (data from Ref. [12]). We distinguish genomes according to formula image content at third codon position. The solid line represents a linear fit of misfolding stability for genomes with moderate or no mutation bias (formula image).
Figure 11
Figure 11. Relationship between GC usage and protein folding stability in orthologous proteins in different prokaryotic genomes (data taken from Ref. [12]).
Histogram of the difference between the actual misfolding stability and the misfolding stability expected from the unfolding stability, using the relationship derived from species with moderate bias (continuous line in the previous plot). Notice that species with small and large GC usage have smaller than expected misfolding stability.

Similar articles

Cited by

References

    1. Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–626. - PubMed
    1. Kimura M. The neutral theory of molecular evolution. Cambridge Univ. Press; 1983.
    1. Taverna DM, Goldstein RA. Why are proteins marginally stable? Proteins. 2002;46:105–109. - PubMed
    1. Muller HJ. Some Genetic Aspects of Sex. American Naturalist. 1932;66:118–138.
    1. Wright SG. The distribution of gene frequencies in populations of polyploids. Proc Natl Acad Sci USA. 1938;24:372–377. - PMC - PubMed

Publication types