Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Sep 5:15:1176934319872980.
doi: 10.1177/1176934319872980. eCollection 2019.

Emergence of Hierarchical Modularity in Evolving Networks Uncovered by Phylogenomic Analysis

Affiliations
Review

Emergence of Hierarchical Modularity in Evolving Networks Uncovered by Phylogenomic Analysis

Gustavo Caetano-Anollés et al. Evol Bioinform Online. .

Abstract

Networks describe how parts associate with each other to form integrated systems which often have modular and hierarchical structure. In biology, network growth involves two processes, one that unifies and the other that diversifies. Here, we propose a biphasic (bow-tie) theory of module emergence. In the first phase, parts are at first weakly linked and associate variously. As they diversify, they compete with each other and are often selected for performance. The emerging interactions constrain their structure and associations. This causes parts to self-organize into modules with tight linkage. In the second phase, variants of the modules diversify and become new parts for a new generative cycle of higher level organization. The paradigm predicts the rise of hierarchical modularity in evolving networks at different timescales and complexity levels. Remarkably, phylogenomic analyses uncover this emergence in the rewiring of metabolomic and transcriptome-informed metabolic networks, the nanosecond dynamics of proteins, and evolving networks of metabolism, elementary functionomes, and protein domain organization.

Keywords: Accretion; biphasic bow-tie pattern; evolutionary diversification; molecular structure; phylogenomic analysis; ribosome.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests:The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Accretion and diversification appear universal. (A) Galaxies, stars, planets, macromolecules, and cities grow and evolve. For example, gravitational attraction causes gas, molecular clouds, dust grains, and particles to accumulate into massive objects in the cosmos. This usually occurs by the formation of spiraling accretion disks, which form out of diffused material in orbital motion around a central body. This is the case for protoplanetary and circumstellar disks and active galactic nuclei, some of which associate with astrophysical jets of ionized matter. Accretion is tightly coupled to diversification. For example, a number of transforming processes—including monolithic collapse, interaction between accretion and mergers, gravitational interaction, and sweeping and ejection events—cause galaxies to diversify. Time of origin is given in billions or thousands of years ago (Gya and Kya, respectively). (B) The molecular structure of the F1/F0 ATP synthase complex that is involved in bioenergetics of the cell evolves by adding protein structural domains. Domains are colored according to their evolutionary age, from red (early) to blue (late).
Figure 2.
Figure 2.
Ancient universal cores and derived peripheries support a biphasic process of accretion. (A) Venn diagrams describe censuses of protein structural domains defined at fold superfamily (FSF) level of structural classification of the SCOP taxonomy, Gene Ontology (GO) terms of molecular functions, RNA families defined by the Rfam database, and homologies of ribosomal proteins (r-proteins). The universal repertoires shared by Archaea, Bacteria, and Eukarya are colored in red. (B) Model of accretion explaining the Venn diagrams. In phase 1, biological repertoires of parts accrete into universal cores, which in phase 2 diversify together with the evolving organisms of the 3 cellular superkingdoms of life.
Figure 3.
Figure 3.
The biphasic history of the ribosome. An evolutionary timeline of ribosomal RNA (rRNA) and proteins (r-proteins) inferred directly from phylogenomic data shows 2 evolutionary phases. During an initial phase (phase 1), helical structures of rRNA and r-proteins accreted to form a universal ribosomal core. The second phase of ribosomal evolution (phase 2) started 1.3 Gya (or earlier) when the universal core diversified alongside with evolving organismal lineages. The phylogenomic tree describes the accretion of rRNA helical stems and is colored according to relative age. Every new branch reflects the addition of a new part to the whole. Only selected functional taxa are labeled in the tree with colored circles. The first RNA structures to accrete include the head and ratchet, the central protuberance, and stalks, which are involved in ribosomal dynamics. Early structures are also involved in energetics, decoding, helicase activity, and translocation. The peptidyl transferase center (PTC) that is responsible for protein biosynthesis accretes later in time (in yellow), whereas RNA helices gradually gained interaction with r-proteins to form a processivity core 2.8 to 3.1 Gya at a time when a crucial “major transition” in ribosomal evolution brought small and large subunits (SSU and LSU) together through protein structural stabilization, interaction surfaces, and formation of intersubunit bridges. The inset shows secondary structure representations of the primordial ribosomal ensemble, with r-proteins visualized as bubbles and bridge interactions as dashed blue lines. This initial proto-ribosome served as center for coordinated ribonucleoprotein accretion to form a highly processive universal ribosome core during a “second transition” that took place 2.4 Gya. A molecular clock of folds linked structural and geological timescales. Source: Data from previous studies.,,
Figure 4.
Figure 4.
A generic biphasic model of module creation illustrates the emergence of network structure in evolution. Nodes and links of the network are parts of a growing system of entities and interactions. The larger number of links, the more cohesive and stable is the structure of a subnetwork. The rise of hierarchical modularity during phase 1 results in small highly connected subnetworks. These subnetworks become modules, which in phase 2 coalesce by combination into higher modules of network structure (highlighted with shades of yellow and blue). The model is inspired by the work of Mittenthal et al.
Figure 5.
Figure 5.
Timeline of metabolomic networks (top) and reduced derivatives (bottom) showing biphasic-rewiring patterns in response to cold stress perturbation. The force-directed Fruchterman-Reingold algorithm places nodes that are more connected with shorter paths in the center of the graphs and pushes sparsely connected nodes toward the periphery. Nodes are colored according to pathway maps in KEGG: yellow—hubs, blue—carbohydrate, green—energy, red—lipid, orange—nucleotide, purple—amino acid, brown—glycan, white—cofactors/vitamins, gray—secondary metabolites and xenobiotics, and black—miscellaneous. The group name “hubs” unifies metabolites associated with more than 1 pathway and are considered central to metabolism. Vertex size is proportional to connectivity. Values in panels indicate modularity scores inferred using the Fast Greedy Clauset-Newman-Moore (FGC) algorithm that measures the community structure of the networks. Metabolite connectivity measured as node-degree of networks at each time point in time-resolved bacterial responses is provided on the right of the corresponding time series. Source: Data from Aziz et al.
Figure 6.
Figure 6.
Evolution in network morphospaces. (A) Morphospaces of network structure and hierarchy showing toy examples of typical graphs describing archetypes of the phenotypic landscapes. In one morphospace (left), Erdös-Rényi (ER) random graphs transform into regular graphs by decreasing randomness or into modular ER graphs by increasing modularity. Hierarchical modular structure requires both increasing modularity and heterogeneity and decreasing randomness. In another morphospace (right), treeness defines the unification or diversification of hierarchical signal in the network, whereas orderability defines the centrality of cycles in network structure. (B) Morphospace of network structure describing the molecular dynamics (MD) of protein loops of aminoacyl-tRNA synthetases. Networks of the MD trajectories of protein loops unfold in a dynamic morphospace of trade-off solutions between flexibility (network modularity), economy (network heterogeneity illustrating scalefreeness), and robustness (network randomness). Modularity, heterogeneity, and randomness were measured with the maximum modularity score, the maximum likelihood scaling exponent α, and the logarithm of Bartel’s test statistic, respectively. Tracing the evolutionary age of structural domains harboring the loop structures onto the cloud of data points reveals a layering pattern, from red (early origin) to blue (late origin). The networks that are less random and more modular are the oldest, whereas the youngest networks are more random and less modular. Data points of the 3-dimensional scatter plot are mapped onto projection planes and connected with vertical leading drop lines along the heterogeneity axis. Black stars indicate significant departure from power-law behavior (P < 0.05), which measures scale-free structure (heterogeneity).
Figure 7.
Figure 7.
Emergence of modularity in biological networks. (A) Early evolution of the purine metabolic network. The reconstruction of metabolic subnetworks that were present 3.8, 3.5, and 3 Gya reveal the piecemeal recruitment of functional modules for the nucleotide interconversion (INT), catabolism and salvage (CAT), and biosynthetic (BIO) pathways. Plausible metabolites and prebiotic chemical reactions supporting the emergent enzymatic reactions are depicted with red nodes and connections, respectively. Unknown reaction candidates or withering prebiotic pathways are indicated with dashed lines. These ancient chemistries are gradually replaced by modern pathways and are unified from separate components into a cohesive network of INT, CAT, and BIO modules. The network was rendered using the energy spring embedders and the Fruchterman-Reingold algorithm of Pajek. Full metabolite names can be found in the work by Caetano-Anollés and Caetano-Anollés. (B) The emergence of the elementary functionome (EF) network that connects protein structural domains to elementary functional loops (EFLs) when these substructures are embedded in protein structure. Bipartite networks are rendered as waterfall diagrams (see Figure 8), with time flowing from top to bottom. The first “p-loop” and second “winged helix” waves of recruitment are indicated with numbers. Data are from Aziz et al. (C) Evolution of networks of protein domain organization. The combination of structural domains in multidomain proteins induces connectivity between nodes representing domain and domain combinations in the network when a domain is present in a structure. As networks grow, older nodes are placed in the middle of radial graphs. Note how the “big bang” of domain combinations occurring 1.23 Gya during the rise of diversified organismal lineages results in a massive graph. Evolutionary data and networks from Wang and Caetano-Anollés and Aziz and Caetano-Anollés. Protein ages were derived from phylogenomic trees describing the evolution of domains at fold family (FF) (panel A) and fold superfamily (FSF) (panels B and C) levels. Panels B and C describe networks present 2.3, 1.5, and 0 Gya during culmination of the architectural, superkingdom specification, and organismal diversification epoch of the protein world, respectively. Modularity (Q) measures connectivity density in node communities and Fast Greedy Community (FGC) measures community structure. In all cases, Q and FGC significantly increase in evolution much earlier than 2.3 Gya and then reach a plateau and decrease.
Figure 8.
Figure 8.
A bipartite network view of levels of organization. (A) Any system of interacting entities describable with networks can be dissected into a hierarchical system with nested entities defining different levels of organization (eg, U, V, W, and X). Network interactions that are tightly knit generate modules, which enable the functional activities of the system. A bipartite network makes explicit the relationship between any 2 levels of organization when it is dissected into its 2 one-mode projections. One projection describes how higher level entities link lower level entities to each other. The other describes how lower level entities link higher level entities to each other. As an example, a bipartite network describing interactions between entities of the V and W levels is shown in the right together with its corresponding V and W projections. For simplicity, links are left unweighted. (B) The V-W bipartite network is transformed into a flow hierarchy when some or all connections are described as arcs pointing in the direction of time. (C) The flow hierarchy becomes a waterfall diagram when the ages of nodes are treated as “time events” and are used to reorganize the network in the direction of time.

References

    1. Simon HA. The architecture of complexity. Proc Am Phil Soc. 1962;106:467-482.
    1. Barabasi AL, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5:101-113. - PubMed
    1. Caetano-Anollés D, Caetano-Anollés K, Caetano-Anollés G. Evolution of macromolecular structure: a “double tale” of biological accretion and diversification. Sci Prog. 2018;101:360-383. - PMC - PubMed
    1. Waggoner RV. Relativistic and Newtonian diskoseismology. New Astron Rev. 2008;51:828-834.
    1. Pudritz RE. Clustered star formation and the origin of stellar masses. Science. 2002;295:68-76. - PubMed

LinkOut - more resources