Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 23;118(8):e2018830118.
doi: 10.1073/pnas.2018830118.

Structured sequences emerge from random pool when replicated by templated ligation

Affiliations

Structured sequences emerge from random pool when replicated by templated ligation

Patrick W Kudella et al. Proc Natl Acad Sci U S A. .

Abstract

The central question in the origin of life is to understand how structure can emerge from randomness. The Eigen theory of replication states, for sequences that are copied one base at a time, that the replication fidelity has to surpass an error threshold to avoid that replicated specific sequences become random because of the incorporated replication errors [M. Eigen, Naturwissenschaften 58 (10), 465-523 (1971)]. Here, we showed that linking short oligomers from a random sequence pool in a templated ligation reaction reduced the sequence space of product strands. We started from 12-mer oligonucleotides with two bases in all possible combinations and triggered enzymatic ligation under temperature cycles. Surprisingly, we found the robust creation of long, highly structured sequences with low entropy. At the ligation site, complementary and alternating sequence patterns developed. However, between the ligation sites, we found either an A-rich or a T-rich sequence within a single oligonucleotide. Our modeling suggests that avoidance of hairpins was the likely cause for these two complementary sequence pools. What emerged was a network of complementary sequences that acted both as templates and substrates of the reaction. This self-selecting ligation reaction could be restarted by only a few majority sequences. The findings showed that replication by random templated ligation from a random sequence input will lead to a highly structured, long, and nonrandom sequence pool. This is a favorable starting point for a subsequent Darwinian evolution searching for higher catalytic functions in an RNA world scenario.

Keywords: DNA replication; Darwinian evolution; origin of life; sequence entropy; templated ligation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Templated ligation of random sequence DNA 12-mers. (A) Before cells evolved, the first ribozymes were thought to perform basic cell functions. In the exponentially vast sequence space, spontaneous emergence of a functional ribozyme is highly unlikely, therefore preselection mechanisms were likely necessary. (B) In our experiment, DNA strands hybridize at low temperatures to form three-dimensional complexes that can be ligated and preserved in the high temperature dissociation steps. The system self-selects for sequences with specific ligation site motifs as well as for strands that continue acting as templates. Hairpin sequences are therefore suppressed. (C) Concentration analysis shows progressively longer strands emerging after multiple temperature cycles. The inset (A-red, T-blue) shows that, although 12-mers (88,009 strands) have essentially random sequences (white), various sequence patterns emerge in longer strands (60-mers, 235,913 strands analyzed). (D) Samples subjected to different number (0 to 1,000) of temperature cycles between 75 °C and 33 °C. Concentration quantification is done on PAGE with SYBR poststained DNA.
Fig. 2.
Fig. 2.
Hairpin formation amplifies selection into A-rich and T-rich sequences. (A) Relative entropy reduction as a function of multimer product length: 1 is a random pool, and 0 is a unique sequence. (B) Relative entropy reduction of 60-mer products. Black: Entropy reduction of 12 nt subsequences compared with a random sequence strand of the same length. Gray: Entropy reduction at each nucleotide position showing positional dependence. (C) A gradual development of the bimodal distribution of A:T ratio in chains of different lengths. Whereas the A:T ratio in 12-mers has a single-peaked nearly binomial distribution, 24-mers already have a clearly bimodal distribution peaked at 65:35% (A-type strands) and 35:65% (T-type strands) A:T ratios. (D) Emergence of a bimodal distribution in a kinetic model of templated ligation. (E) Sequences with nearly balanced A:T ratios are prone to the formation of hairpins. In the model in (D) and the experiment, these hairpins prevent strands from acting as templates and substrates for ligation reactions, thereby suppressing the central part of the distribution. (F) A:T ratio distributions in strands of different lengths. As length increases, A-type strands become progressively more abundant in comparison to T-type strands. (G) A:T ratio distributions in a phenomenological model taking into account a slight AT-bias in the initial 12-mer pool resemble experimentally measured ones (E).
Fig. 3.
Fig. 3.
Large-scale entropy reduction and sequence correlation per strand. (A) Sketch of a single-strand DNA secondary structure folding on itself, called hairpin. The double stranded part is very similar to a standard duplex DNA. (B) Comparing the PDFs of the maximum hairpin stem length for all strands reveals a group of peaks at around 4 to 7 nt, increasing with the DNA length. Starting with 48-mers, there is a tail visible. These self-similar strands are more abundant the longer the product grows (compare A:T fraction close to P = 0.5 in Fig. 2C). (C) The peak positions as function of the product length follow Eq. 3. The unbiased 12-mers are on the curve with coefficient P = 0.5, whereas the products starting from 36mers lay on the curve with P = 0.785. The bias parameter p is derived from the PDFs in Fig. 2D and describes the A:T-ratio in the strand.
Fig. 4.
Fig. 4.
Emergent landscape of junction sequences. (A) The heatmap of Z-scores quantifying the probability to find a junction between a 6 nt sequence listed in rows followed by the 6 nt sequence listed in columns compared with finding it by pure chance and normalized by the SD. Z-scores were calculated for the junction between fourth and the fifth 12-mers in 72-mers of A-type (Left) and T-type (Right), respectively. Other internal junctions in all long chains form very similar landscapes comprised of over- (teal) and underrepresented (ocher) sequences and described in detail in the text. T-type sequences complementary to A-type sequences correspond to the 90° clockwise rotation of the left (note a similarity of landscapes in two panels after this transformation). (B) The matrix of sample Pearson correlation coefficients between abundances of 12-mers in different positions (–6) inside 72-mers (rows) and 84-mers (columns). Light regions mark low correlations, and dark regions mark high correlations. Very high correlations (>0.9) at the center of the table mean that very similar sequences get selected at all internal positions of chains of different lengths. Different selection pressures operate on the first 12-mer and the last 12-mer of a chain, yet their sequences are similar in chains of different lengths.
Fig. 5.
Fig. 5.
Testing self-selection with custom sequence pools. (A) The de Bruijn graph of overrepresented sequence motifs between consecutive 12-mers found in long oligomers. All internal junctions of A-type sequences >48 nt are shown, except the first and the last. All analyzed strands have a Z-score >30 and are sequenced at least 20 times. (B) The same de Bruijn graph but for T-type sequences with Z-score >15 and sequenced at least 10 times. Four pairs of most common reverse complementary 12-mers are connected by purple dashed arrows. In each network, three families with distinctly similar patterns are observed that each include at least one of the complementary strands. Node sizes reflect relative abundance of 12-mers, and edge thickness denotes the Z-score of the junction between nodes it connects. Light and dark magenta-colored nodes are the eight most abundant 12-mers in each of the two networks. (C) PAGE images of templated ligation of three different samples of 12-mers after different number of temperature cycles (columns): “Replicator”: four substrate 12-mers and four template 12-mers artificially designed for templated ligation, as explained in SI Appendix; “Random”: eight random sequence 12-mers randomly selected from the 4,096 possible AT-only 12-mers; “Network”: the four most common 12-mers from A-type and another four of T-type shown in dark magenta in A. (D) After 200 temperature cycles, the “Replicator” shows a consistently higher product concentration for all lengths followed by the “Network” sample and then by the “Random” subsamples. In the “Network” and “Random” samples, the length distribution above 48 nt is well described by an exponential distribution as predicted in ref. . (E) Pearson correlation matrices between 12-mer abundances within 72-mers and 84-mers in each sample (same as in Fig. 4B). Although the pattern of correlations in the “Network” sample (second from left) resembles that shown in Fig. 4B (reproduced in the leftmost subpanel), the “Random” sample (second from right) singles out the last 12-mer but not the first one. The “Replicator” sample (the rightmost subpanel) has its own distinct self-similar pattern of correlations.

Similar articles

Cited by

References

    1. Crick F. H. C., The origin of the genetic code. J. Mol. Biol. 38, 367–379 (1968). - PubMed
    1. Orgel L. E., Evolution of the genetic apparatus: A review. Cold Spring Harb. Symp. Quant. Biol. 52, 9–16 (1987). - PubMed
    1. Walter G., The RNA world. Nature 319, 618 (1986).
    1. Attwater J., Wochner A., Pinheiro V. B., Coulson A., Holliger P., Ice as a protocellular medium for RNA replication. Nat. Commun. 1, 76 (2010). - PubMed
    1. Joyce G. F., Toward an alternative biology. Science 336, 307–308 (2012). - PubMed

Publication types

MeSH terms

Substances