Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Mar 23:5:24.
doi: 10.1186/1471-2148-5-24.

Evolutionary cores of domain co-occurrence networks

Affiliations

Evolutionary cores of domain co-occurrence networks

Stefan Wuchty et al. BMC Evol Biol. .

Abstract

Background: The modeling of complex systems, as disparate as the World Wide Web and the cellular metabolism, as networks has recently uncovered a set of generic organizing principles: Most of these systems are scale-free while at the same time modular, resulting in a hierarchical architecture. The structure of the protein domain network, where individual domains correspond to nodes and their co-occurrences in a protein are interpreted as links, also falls into this category, suggesting that domains involved in the maintenance of increasingly developed, multicellular organisms accumulate links. Here, we take the next step by studying link based properties of the protein domain co-occurrence networks of the eukaryotes S. cerevisiae, C. elegans, D. melanogaster, M. musculus and H. sapiens.

Results: We construct the protein domain co-occurrence networks from the PFAM database and analyze them by applying a k-core decomposition method that isolates the globally central (highly connected domains in the central cores) from the locally central (highly connected domains in the peripheral cores) protein domains through an iterative peeling process. Furthermore, we compare the subnetworks thus obtained to the physical domain interaction network of S. cerevisiae. We find that the innermost cores of the domain co-occurrence networks gradually grow with increasing degree of evolutionary development in going from single cellular to multicellular eukaryotes. The comparison of the cores across all the organisms under consideration uncovers patterns of domain combinations that are predominately involved in protein functions such as cell-cell contacts and signal transduction. Analyzing a weighted interaction network of PFAM domains of yeast, we find that domains having only a few partners frequently interact with these, while the converse is true for domains with a multitude of partners. Combining domain co-occurrence and interaction information, we observe that the co-occurrence of domains in the innermost cores (globally central domains) strongly coincides with physical interaction. The comparison of the multicellular eukaryotic domain co-occurrence networks with the single celled of S. cerevisiae (the overlap network) uncovers small, connected network patterns.

Conclusion: We hypothesize that these patterns, consisting of the domains and links preserved through evolution, may constitute nucleation kernels for the evolutionary increase in proteome complexity. Combining co-occurrence and physical interaction data we argue that the driving force behind domain fusions is a collective effect caused by the number of interactions and not the individual interaction frequency.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Basic statistics of domain occurrence networks. (a) All domains co-occurring in a single protein are represented as a fully connected unweighted clique in the network. (b) Determining the number of domains each protein contains in H. sapiens, M. musculus, D. melanogaster, C. elegans, and S. cerevisiae, we observe power-laws P(N) ~ N-δ in frequency distributions thus obtained (see Table 1 for detailed values). This inhomogeneity in domain architectures suggests that the vast majority of proteins in all organisms considered contains only one domain. (c) Counting the occurrence of each domain in the proteomes of the organisms under consideration, we find a positive power-law dependence from the mean number of co-occurring domains – the degree – ⟨k⟩ ~ Nε, suggesting that on average frequent occurrence of a domain coincide with the participation in various domain architectures (see Table 1 for detailed values). (d). The domain networks of H. sapiens, M. musculus, D. melanogaster, C. elegans, and S. cerevisiae display scale-free behavior, a network feature which is characterized by the power-law in the degree distribution P(k) ~ k-θ [15] (see Table 1 for detailed values). (e) The network's inherent modularity is indicated by the presence of a power-law dependence between the clustering coefficient and the degree as a generalized Zipf-law ⟨C(k)⟩ = α(β + k)-γ (see Table 1 for detailed values). With respect to (b,c,d,e), we observe that the organisms specific distributions differ by their individual power-law exponents, indicating their levels of evolutionary development.
Figure 2
Figure 2
Cores of the domain co-occurrence networks. The k-core of a graph is defined as the largest subgraph where every node has at least k links. For each choice of k, we determine the k-cores by iteratively pruning all nodes with degree lower than k and their incident links. In the schematic representation, the 1-core consists of all the nodes while the 3-core only contains the nodes on orange background. Panels a-e show the 2 innermost k-cores (red: 1-core and yellow: 2-core) of the domain networks mapped for the proteomes of (a) S. cerevisiae, (b) C. elegans, (c) D. melanogaster, (d) M. musculus and (e) H. sapiens. (f) Local vs. global centrality. Interpreted as its importance a node is related to its degree and network neighborhood. A hub that is only a member of the outer k-cores is defined as locally central (top-left), while nodes (not necessarily the biggest hubs) being-members of the innermost cores are globally central (top-right).
Figure 3
Figure 3
Statistics of the domain interaction network of Yeast. The domain interaction network has an average node connectivity of ⟨k⟩ = 16.9 along with a reasonably high degree of clustering ⟨C⟩ = 0.34. (a) The degree distribution of the domain interaction network displays a power-law, following the generalized Zipf-law P(k) = α(β + k)-γ where α = 3,406.4, β = 67.4 and γ = 2.3. The network's inherent modularity is suggested by the presence of a power-law dependence in the average clustering coefficient ⟨C⟩ ~ k-β (inset), where β = 0.5. (b) The average strength si of each interaction domain i displays a power-law (si(ki) ~ formula image) over four decades. Obviously, this is an effect of a domains level of interaction, since we only recover a weak decrease of the strength si toward higher degree ki, si ~ k-0.1.
Figure 4
Figure 4
Driving force behind fusion proteins. (a) Nesting toward the innermost core for the eukaryotes S. cerevisiae, C. elegans, D. melanogaster, M. musculus and H. sapiens we find that the co-occurrence links increasingly coincide with links in the Yeast protein domain interaction network. (b) The interaction strength s of domains is the sum of the interaction weights of all links a domain is involved in the corresponding cores of the respective co-occurrence networks. Averaging over the size of the corresponding cores, the average interaction strength decreases toward the innermost cores.
Figure 5
Figure 5
Overlap of domain co-occurrence networks. (a) We define the overlap of two networks as the edges, and their concomitant nodes, common to both networks. (b) The overlap of the four innermost k-cores of the co-occurrence domain graphs of S. cerevisiae, C. elegans, D. melanogaster, M. musculus and H. sapiens only shows a small number of conserved edges (red: 1-core, yellow: 2-core, green: 3-core, blue: 4-core). The overlap of the 1-cores consists of a fully connected kernel populated by signaling domains. Nesting outward in the overlap of the 2, 3, 4-cores ((b),(c)), domains that are responsible for signal transduction such as zinc-fingers and cell-cell contacts are dominating.

References

    1. Albert R, Barabási AL. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74:47. doi: 10.1103/RevModPhys.74.47. - DOI
    1. Barabási A, Albert R. Emergence of Scaling in Random Networks. Science. 1999;286:509–512. doi: 10.1126/science.286.5439.509. - DOI - PubMed
    1. Jeong H, Tombor B, Albert R, Oltvai Z, Barabási AL. The large-scale organization of metabolic networks. Nature. 2000;407:651–654. doi: 10.1038/35036627. - DOI - PubMed
    1. Fell D, Wagner A. The small world of metabolism. Nature Biotech. 2000;189:1121–1122. doi: 10.1038/81025. - DOI - PubMed
    1. Wagner A, Fell DA. The small world inside large metabolic networks. Proc Roy Soc London Series B. 2001;268:1803–1810. doi: 10.1098/rspb.2001.1711. - DOI - PMC - PubMed

LinkOut - more resources