Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Oct;13(10):2651-64.
doi: 10.1110/ps.04802904. Epub 2004 Aug 31.

Simulating evolution by gene duplication of protein features that require multiple amino acid residues

Affiliations

Simulating evolution by gene duplication of protein features that require multiple amino acid residues

Michael J Behe et al. Protein Sci. 2004 Oct.

Abstract

Gene duplication is thought to be a major source of evolutionary innovation because it allows one copy of a gene to mutate and explore genetic space while the other copy continues to fulfill the original function. Models of the process often implicitly assume that a single mutation to the duplicated gene can confer a new selectable property. Yet some protein features, such as disulfide bonds or ligand binding sites, require the participation of two or more amino acid residues, which could require several mutations. Here we model the evolution of such protein features by what we consider to be the conceptually simplest route-point mutation in duplicated genes. We show that for very large population sizes N, where at steady state in the absence of selection the population would be expected to contain one or more duplicated alleles coding for the feature, the time to fixation in the population hovers near the inverse of the point mutation rate, and varies sluggishly with the lambda(th) root of 1/N, where lambda is the number of nucleotide positions that must be mutated to produce the feature. At smaller population sizes, the time to fixation varies linearly with 1/N and exceeds the inverse of the point mutation rate. We conclude that, in general, to be fixed in 10(8) generations, the production of novel protein features that require the participation of two or more amino acid residues simply by multiple point mutations in duplicated genes would entail population sizes of no less than 10(9).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A freshly duplicated gene must accrue several compatible mutations without suffering a null mutation in order to code for the multiresidue (MR) feature. Each box in an array represents a nucleotide position in the duplicated gene. The three boxes outlined in blue are the positions that must be changed in order to produce the new MR feature. (Although they are contiguous in the drawing, they do not necessarily represent contiguous positions in the gene.) A “+” labels a compatible mutation. A red “X” labels a null mutation. The green-shaded box represents the gene coding for the MR feature, where the several necessary changes have all been acquired. The forward mutation rate is v times the number of incompatible loci λ remaining to be changed. The null mutation rate is ρv.
Figure 2.
Figure 2.
Fraction (1-φ) of a nucleotide position in a compatible state versus time (generations) normalized for the mutation rate (vt). In all cases, the curves are determined from equation 2. (Top) N =10,000, v =0.001, deterministic reproduction. Circles: ρ=1, λ =2; inverted triangles: ρ=2, λ =3; squares: ρ=4, λ =5; diamonds: ρ=10, λ =10; triangles: ρ=100, λ =10. Each point is the average of 100 repetitions. (Bottom) N =100, v =0.001, ρ=1, λ =6. Circles are for deterministic reproduction; each point is the average of 100 repetitions. Triangles are for stochastic reproduction; each point is the average of 1024 repetitions.
Figure 3.
Figure 3.
Normalized time (generations) to first appearance (vTf) versus number of loci λ required to be changed to yield the multiresidue (MR) feature. In all cases, the curves are determined from equation 3. v =0.01. Reproduction was deterministic. Filled circles, N =1; open circles, N =10; filled inverted triangles, N =100; open inverted triangles circles, N =1000; filled squares, N =10,000; open squares, N =100,000. (Upper left) ρ=1; (upper right) ρ =2; (lower left) ρ =4; (lower right) ρ =10. Each point is the average of 100 repetitions.
Figure 4.
Figure 4.
Normalized time (generations) to fixation (vTfx) versus the selection coefficient s. In all cases, the curves are determined from equation 4. Reproduction was stochastic. N =1000; v =0.01–0.0001. Each point is the average of 100 repetitions. (Top) ρ=1. Filled circles, λ =1; open circles, λ =2; filled inverted triangles, λ =3; open inverted triangles circles, λ =4; filled squares, λ =5; open squares, λ =6; filled diamonds, λ =7; open diamonds, λ =8. (Bottom) ρ=10. Filled circles, λ =1; open circles, λ =2; filled inverted triangles, λ =3; open inverted triangles, λ =4.
Figure 5.
Figure 5.
Effect of pre-equilibration of the population on normalized time (generations) to first appearance (vTf) versus number of loci λ required to be changed to yield the MR feature. N =1000; v =0.001; ρ=1. Each point is the average of 100 repetitions. The curve is determined from equation 3. Reproduction was deterministic. The simulation was pre-equilibrated (that is, the population was subject to mutation and reproduction without checking for the appearance of the multiresidue (MR) feature, regarding it as neutral) for filled circles, 0 generations; open circles, 0.1 / v generations; filled inverted triangles, 0.3 / v generations; open inverted triangles, 1 / v generations; filled squares, 3 / v generations.
Figure 6.
Figure 6.
Time to fixation Tfx versus number of loci λ required to be changed to yield the multiresidue (MR) feature. v =10−8; ρ=1000; s =0.01. Values for population sizes N are given across the top axis. In all cases the curves are determined from equation 4. A line is drawn across the figure at Tfx = 1 / v, which is 108 generations. Above the line, values for Tfx are essentially unaffected by pre-equilibration of the population in the absence of selection.

Comment in

References

    1. Axe, D.D., Foster, N.W., and Fersht, A.R. 1996. Active barnase variants with completely random hydrophobic cores. Proc. Natl. Acad. Sci. 93 5590–5594. - PMC - PubMed
    1. ———. 1998. A search for single substitutions that eliminate enzymatic function in a bacterial ribonuclease. Biochemistry 37 7157–7166. - PubMed
    1. Bowie, J.U. and Sauer, R.T. 1989. Identifying determinants of folding and activity for a protein of unknown structure. Proc. Natl. Acad. Sci. 86 2152–2156. - PMC - PubMed
    1. Bowie, J.U., Reidhaar-Olson, J.F., Lim, W.A., and Sauer, R.T. 1990. Deciphering the message in protein sequences: Tolerance to amino acid substitutions. Science 247 1306–1310. - PubMed
    1. Braden, B.C. and Poljak, R.J. 1995. Structural features of the reactions between antibodies and protein antigens. FASEB J. 9 9–16. - PubMed

LinkOut - more resources