Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(10):e26324.
doi: 10.1371/journal.pone.0026324. Epub 2011 Oct 18.

Topological structure of the space of phenotypes: the case of RNA neutral networks

Affiliations

Topological structure of the space of phenotypes: the case of RNA neutral networks

Jacobo Aguirre et al. PLoS One. 2011.

Erratum in

  • PLoS One. 2011;6(12). doi: 10.1371/annotation/b3e79f42-7316-4ff8-9762-514120463813
  • PLoS One. 2011;6(12). doi: 10.1371/annotation/e1599064-95cc-47c9-bbcc-042202bf0423
  • PLoS One. 2011;6(12). doi:10.1371/annotation/0a0bed45-e421-4f6a-8d7f-f48194a14516

Abstract

The evolution and adaptation of molecular populations is constrained by the diversity accessible through mutational processes. RNA is a paradigmatic example of biopolymer where genotype (sequence) and phenotype (approximated by the secondary structure fold) are identified in a single molecule. The extreme redundancy of the genotype-phenotype map leads to large ensembles of RNA sequences that fold into the same secondary structure and can be connected through single-point mutations. These ensembles define neutral networks of phenotypes in sequence space. Here we analyze the topological properties of neutral networks formed by 12-nucleotides RNA sequences, obtained through the exhaustive folding of sequence space. A total of 4(12) sequences fragments into 645 subnetworks that correspond to 57 different secondary structures. The topological analysis reveals that each subnetwork is far from being random: it has a degree distribution with a well-defined average and a small dispersion, a high clustering coefficient, and an average shortest path between nodes close to its minimum possible value, i.e. the Hamming distance between sequences. RNA neutral networks are assortative due to the correlation in the composition of neighboring sequences, a feature that together with the symmetries inherent to the folding process explains the existence of communities. Several topological relationships can be analytically derived attending to structural restrictions and generic properties of the folding process. The average degree of these phenotypic networks grows logarithmically with their size, such that abundant phenotypes have the additional advantage of being more robust to mutations. This property prevents fragmentation of neutral networks and thus enhances the navigability of sequence space. In summary, RNA neutral networks show unique topological properties, unknown to other networks previously described.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Construction of neutral networks.
In (A), we show an example of how neutral networks are constructed: sequences that fold into the same secondary structure are connected if they are at a Hamming distance of one. In (B), we show all sequences of length 12 that fold into the secondary structure (.(....))..., which is ranked in the 46th position. Although all sequences fold into the same secondary structure, the neutral network splits into 3 isolated subnetworks of sizes N = 404, 341, and 55.
Figure 2
Figure 2. Subnetworks size ranking.
In linear-logarithmic scale, ranking distribution of subnetwork sizes. Colors indicate the number of base pairs Lp in the secondary structure: one pair (black), two pairs (red), three pairs (green) and four pairs (blue). The solid line corresponds to an exponential fitting. Insets show for each group of structures (with the same Lp) the size of the subnetworks (in the y-axis) that belong to the same neutral network as a function of the corresponding neutral network size (in the x-axis). Note changes of scale in both axes.
Figure 3
Figure 3. Degree distribution p(k) and average degree
(A) Degree distribution p(k) of fifteen subnetworks. They are the five largest (black curves), five of intermediate size (brown curves, one order of magnitude smaller) and five small subnetworks (blue curves, two orders of magnitude smaller). (B) Average degree formula image as a function of the subnetwork size N. Colors correspond to one (black), two (red), three (green) and four (blue) base pairs in the secondary structure. The solid line corresponds to the numerical fitting formula image (note the logarithmic-linear scale). The analytical approximation to formula image making use of the values of formula image, formula image and α obtained from all the 12-nt folded sequences (and implying AS = 0.53) is plotted in long-dashed black line. The upper and lower bounds to coefficient AS yield formula image and formula image (plotted in short-dashed red lines).
Figure 4
Figure 4. Clustering.
(A) Clustering distribution C(k) for the fifteen networks analyzed in Fig. 3. (B) Average clustering C(N) as a function of the subnetwork size N for all folded neutral networks (colored circles), equivalent random networks (black squares) and theoretical predictions with a classical random model (formula image, green stars). Circle colors correspond to the number of base pairs of each subnetwork (see caption of Fig. 3). In both plots (A) and (B), the analytical approximations using the values of formula image, formula image and α obtained from all the 12-nt folded sequences are plotted in long-dashed black lines.
Figure 5
Figure 5. Assortativity.
(A) Average nearest neighbors degree knn(k) as a function of k for fifteen networks of different sizes. (B) Assortativity parameter r as a function of the network size. As in previous figures, colors correspond to the number of base pairs of the subnetwork: one (black), two (red), three (green) and four (blue). The r for equivalent random networks are plotted in black squares.
Figure 6
Figure 6. Probability of mutation.
Probability of mutation at each position of the sequence for two different secondary structures (see x-axis labels of both plots). (A) corresponds to the largest subnetwork N = 57481, whose secondary structure is fourth by abundance. (B) corresponds to the largest subnetwork N = 35594 of the most abundant secondary structure. We plot the sequences grouped by degree (dotted, dashed and dashed-dotted lines) together with their averages (solid lines).
Figure 7
Figure 7. Average shortest path
Dependence of the average shortest path on the subnetwork size N for all folded neutral networks (colored circles), equivalent random networks (black squares) and theoretical predictions with a classical random model (formula image, green stars). Circle colors correspond to the number of base pairs of each subnetwork (see caption of Fig. 3). The numerical fitting is plotted as a solid black line, while the analytical approximations correspond to the long-dashed black lines (for values of α and AS numerically obtained from the folding of all 12-nt sequences). Inset (A): relation between the average shortest path formula image and the average Hamming distance formula image of the subnetworks. Inset (B): relation between the longest distance between any pair of nodes of the network dmax and the maximum number of different bases between sequences Hmax (maximum Hamming distance). In the insets, the dashed lines are formula image and formula image, which correspond to the lower bounds of formula image and formula image, respectively.
Figure 8
Figure 8. Eigenvector centrality.
Largest eigenvalue λ 1 of the adjacency matrix A as a function of the network size N. The inset shows the linear relationship between λ 1 and the network average degree formula image. Solid line in the inset is formula image.
Figure 9
Figure 9. Sequence centrality.
Evaluation of the sequence centrality for the largest subnetwork N = 57481, whose secondary structure is ((....))..... In (A), degree ki versus eigenvector centrality v 1(i). In (B), degree ki versus betweenness centrality B(i). Colors and shapes denote the type of base pairs the sequences have (see Figure's legend). Note the community division created by the eigenvector centrality, which is related to the type of nucleotides participating in the base pair: GC+UA and AU+CG for low eigenvector centrality, GU+CG and GC+UG and for intermediate v 1(i) and GC+CG for high v 1(i).
Figure 10
Figure 10. Comparison between l = 12 and l = 35 neutral networks.
Rank ordering of network sizes for l = 12 (A) and l = 35 (B). Black arrows signal the first non-stem-loop structure. Network size abundance for l = 12 (C) and l = 35 (D). The solid lines correspond to exponential fits, while the dashed line corresponds to a logarithmic decay. Data for l = 35 after .

References

    1. Fontana W, Schuster P. Shaping space: The possible and the attainable in RNA genotypephenotype mapping. J Theor Biol. 1998;194:491–515. - PubMed
    1. Schuster P. Molecular insights into evolution of phenotypes. In: Crutchfield JP, Schuster P, editors. Evolutionary Dynamics. Oxford Univ. Press; 2003. pp. 163–215.
    1. Schuster P. Prediction of RNA secondary structures: from theory to models and real molecules. Rep Prog Phys. 2006;69:1419–1477.
    1. Grüner W, Giegerich R, Strothmann D, Reidys C, Weber J, et al. Analysis of RNA sequence structure maps by exhaustive enumeration. II. Structures of neutral networks and shape space covering. Monatsh Chem. 1996;127:375–389.
    1. Fontana W, Konings DAM, Stadler PF, Schuster P. Statistics of RNA secondary structures. Biopolymers. 1993;33:1389–1404. - PubMed

Publication types