Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 15;119(11):e2113883119.
doi: 10.1073/pnas.2113883119. Epub 2022 Mar 11.

Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution

Affiliations

Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution

Iain G Johnston et al. Proc Natl Acad Sci U S A. .

Abstract

SignificanceWhy does evolution favor symmetric structures when they only represent a minute subset of all possible forms? Just as monkeys randomly typing into a computer language will preferentially produce outputs that can be generated by shorter algorithms, so the coding theorem from algorithmic information theory predicts that random mutations, when decoded by the process of development, preferentially produce phenotypes with shorter algorithmic descriptions. Since symmetric structures need less information to encode, they are much more likely to appear as potential variation. Combined with an arrival-of-the-frequent mechanism, this algorithmic bias predicts a much higher prevalence of low-complexity (high-symmetry) phenotypes than follows from natural selection alone and also explains patterns observed in protein complexes, RNA secondary structures, and a gene regulatory network.

Keywords: algorithmic information theory; development; evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
(A) Protein complexes self-assemble from individual units. (B) Frequency of 6-mer protein complex topologies found in the PDB versus the number of interface types, a measure of complexity K˜(p). Symmetry groups are in standard Schoenflies notation: C6, D3, C3, C2, and C1. There is a strong preference for low-complexity/high-symmetry structures. (C) Histograms of scaled frequencies of symmetries for 6-mer topologies found in the PDB (dark red) versus the frequencies by symmetry of the morphospace of all possible 6-mers illustrate that symmetric structures are hugely overrepresented in the PDB database. (D) Polyomino complexes self-assemble from individual units (here a binds to A) just as the proteins do. (E) Scaled frequency of polyominoes that fix in evolutionary simulations with a fitness maximum at 16-mers, versus the number of interface types (a measure of complexity K˜(p)) exhibits a strong bias toward high-symmetry structures, similar to protein complexes. (F) Histograms of the frequency of symmetry groups for all 16-mers (light) and for 16-mers appearing in the evolutionary runs (dark) quantify how strongly biased variation drives a pronounced preference for high-symmetry structures.
Fig. 2.
Fig. 2.
Frequency with which a particular protein quaternary structure topology p (black circles) appears in the PDB versus complexity K˜(p) = number of interface types closely resembles the frequency/P(p) vs. K˜(p) distribution of all possible polyomino structures, obtained by randomly sampling 108 genotypes for the S16,64 space (green circles). Simpler (more compressible) phenotypes are much more likely to occur. An illustrative AIT upper bound from Eq. 1 is shown with a=0.75,b=0 (dashed red line). (Inset) The frequency with which particular 16-mers are found to fix in evolutionary runs from Fig. 1E is predicted by the probability P(p) (or equivalently the frequency) with which they arise on random sampling of genotypes; the solid line denotes x = y.
Fig. 3.
Fig. 3.
Scaled frequency (occurrence probability) versus complexity K˜(p) for (A) L = 30 RNA full SS and (B) L = 100 SS coarse-grained to level 5 (Materials and Methods). Probabilities for structures taken from random sampling of sequences (light red) compare well to the frequency found in the fRNA database (28) (green dots) for 40,554 functional L = 30 RNA sequences with 17,603 unique dot-bracket SS and for 932 natural L = 100 RNA sequences mapping to 16 unique coarse-grained level 5 structures. The dashed lines show a possible upper bound from Eq. 1. Examples of high-probability/low-complexity and low-probability/high-complexity SS are also shown. We directly compare the frequency of RNA structures in the fRNAdb database to the frequency of structures upon uniform random sampling of genotypes for (C) L = 30 SS and (D) L = 100 coarse-grained structures. The lines are y = x. Correlation coefficients are 0.71 and 0.92, for L = 30 and L = 100, respectively, with P < 106 for both. Sampling errors are larger at low frequencies.
Fig. 4.
Fig. 4.
Scaled frequency vs. complexity K˜(p) for the budding yeast ODE cell cycle model (30). Phenotypes are grouped by complexity of the time output of the key CLB2/SIC1 complex concentration. Higher frequency means a larger fraction of parameters generate this time curve. The red circle denotes the wild-type phenotype, which is one of the simplest and most likely phenotypes to appear. The dashed line shows a possible upper bound from Eq. 1. There is a clear bias toward low-complexity outputs.

Comment in

References

    1. Wagner A., Arrival of the Fittest: Solving Evolution’s Greatest Puzzle (Penguin, 2014).
    1. Ahnert S. E., Structural properties of genotype-phenotype maps. J. R. Soc. Interface 14, 20170275 (2017). - PMC - PubMed
    1. Manrubia S., et al. ., From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys. Life Rev. 38, 55–106 (2021). - PubMed
    1. Prusinkiewicz P., Erasmus Y., Lane B., Harder L. D., Coen E., Evolution and development of inflorescence architectures. Science 316, 1452–1456 (2007). - PubMed
    1. Dawkins R., “The evolution of evolvability” in Artificial life: The proceedings of an interdisciplinary workshop on the synthesis and simulation of living systems, Langton C. G., Ed. (Addison-Wesley Publishing Co., Redwood City, CA, 1988), pp. 201–220.

LinkOut - more resources