Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Oct 15:2024.10.11.617952.
doi: 10.1101/2024.10.11.617952.

A fitness distribution law for amino-acid replacements

Affiliations

A fitness distribution law for amino-acid replacements

Mengyi Sun et al. bioRxiv. .

Abstract

The effect of replacing the amino acid at a given site in a protein is difficult to predict. Yet, evolutionary comparisons have revealed highly regular patterns of interchangeability between pairs of amino acids, and such patterns have proved enormously useful in a range of applications in bioinformatics, evolutionary inference, and protein design. Here we reconcile these apparently contradictory observations using fitness data from over 350,000 experimental amino acid replacements. Almost one-quarter of the 20 × 19 = 380 types of replacements have broad distributions of fitness effects (DFEs) that closely resemble the background DFE for random changes, indicating an overwhelming influence of protein context in determining mutational effects. However, we also observe that the 380 pair-specific DFEs closely follow a maximum entropy distribution, specifically a truncated exponential distribution. The shape of this distribution is determined entirely by its mean, which is equivalent to the chance that a replacement of the given type is fitter than a random replacement. In this type of distribution, modest deviations in the mean correspond to much larger changes in the probability of falling in the far right tail, so that modest differences in mean exchangeability may result in much larger differences in the chance of a highly fit mutation. Indeed, we show that under the assumption that purifying selection filters out the vast majority of mutations, the maximum entropy distributions of fitness effects inferred from deep mutational scanning experiments predict the characteristic patterns of amino acid change observed in molecular evolution. These maximum entropy distributions of mutational effects not only provide a tuneable model for molecular evolution, but also have implications for mutational effect prediction and protein engineering.

Keywords: amino acids; deep mutational scanning; epistasis; molecular evolution; protein design.

PubMed Disclaimer

Conflict of interest statement

Author Declaration The authors declare that they have no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Illustration of type-specific DFEs as quantile distributions. Each histogram shows the probability that a replacement of a given type will lie at a given fitness quantile relative to the effects of other amino acid replacements in the same protein. A distribution that is concentrated at low values indicates that the corresponding type of replacement tends to be deleterious, whereas types of substitution that tend to be benign will have distributions concentrated at high values. Mutational types that have the same DFE as random mutations will appear as uniform distributions.
Fig. 2.
Fig. 2.
Distributions of fitness effects for 358,080 amino-acid-altering mutations, categorized by the wild-type (row) and mutant (column) amino acids. Histograms show fitness quantiles observed for each type of amino-acid replacement, along with the corresponding truncated exponential fit (red lines). For each distribution, μ is the mean (equal to the probability that a replacement of this type is fitter than a random mutation) and n is the number of observations. Distributions that deviate significantly from a truncated exponential distribution are marked with a star at the right corner, whereas black triangles in the upper left corner indicate the pairs for which forward-reverse asymmetry is significant (Bonferroni-corrected p < 0.05, two-sample Kolmogorov–Smirnov test).
Fig. 3.
Fig. 3.
Selective filtering provides a quantitative model for patterns of evolution. For the subset of “singlet” replacements accessible via a single-nucleotide mutation, evolutionary exchangeability is represented by Tang’s U, which is derived from alignments of evolved sequences by a method designed to exclude mutational effects and to focus only on selection (32). (A) Given the truncated exponential model for DFEs, modest differences in the mean correspond to much larger differences in the top of the distribution, e.g., a difference of 0.4 vs. 0.6 in the mean corresponds to a 3-fold difference in the chance of exceeding a threshold q = 0.9. (B) Applying threshold selection to the DFEs for the singlet replacements shows that the dynamic range increases as q → 1, until it roughly matches the dynamic range of U (dashed line). (C) The predicted U matrix as q → 1 shows a good correlation with the observed Tang’s U (Pearson’s r = 0.86, p = 5 × 1023), well described by the diagonal line y = x.

References

    1. Zuckerkandl E, Pauling L, Evolutionary Divergence and Convergence in Proteins, eds. V Bryson H Vogel. (Academic Press, New York: ), (1965).
    1. Eck R, Dayhoff M, Atlas of Protein Sequence and Structure. (National Biomedical Research Foundation, Silver Spring, MD: ), (1966).
    1. Kawashima S, Kanehisa M, AAindex: amino acid index database. Nucleic Acids Res 28, 374. (2000). - PMC - PubMed
    1. Sneath P, Relations between chemical structures and biological activity in peptides. J. Theor. Biol. 12, 157 (1966). - PubMed
    1. Epstein CJ, Non-randomness of amino-acid changes in the evolution of homologous proteins. Nature 215, 355–9 (1967). - PubMed

Publication types

LinkOut - more resources