Exploring community structure in biological networks with random graphs

Pratha Sah, Lisa O Singh, Aaron Clauset, Shweta Bansal¹

Affiliations

PMID: 24965130
PMCID: PMC4094994
DOI: 10.1186/1471-2105-15-220

Exploring community structure in biological networks with random graphs

Pratha Sah et al. BMC Bioinformatics. 2014.

. 2014 Jun 25:15:220.

doi: 10.1186/1471-2105-15-220.

Authors

Pratha Sah, Lisa O Singh, Aaron Clauset, Shweta Bansal¹

Affiliation

¹ Department of Biology, Georgetown University, 20057 Washington DC, USA. sb753@georgetown.edu.

PMID: 24965130
PMCID: PMC4094994
DOI: 10.1186/1471-2105-15-220

Abstract

Background: Community structure is ubiquitous in biological networks. There has been an increased interest in unraveling the community structure of biological systems as it may provide important insights into a system's functional components and the impact of local structures on dynamics at a global scale. Choosing an appropriate community detection algorithm to identify the community structure in an empirical network can be difficult, however, as the many algorithms available are based on a variety of cost functions and are difficult to validate. Even when community structure is identified in an empirical system, disentangling the effect of community structure from other network properties such as clustering coefficient and assortativity can be a challenge.

Results: Here, we develop a generative model to produce undirected, simple, connected graphs with a specified degrees and pattern of communities, while maintaining a graph structure that is as random as possible. Additionally, we demonstrate two important applications of our model: (a) to generate networks that can be used to benchmark existing and new algorithms for detecting communities in biological networks; and (b) to generate null models to serve as random controls when investigating the impact of complex network features beyond the byproduct of degree and modularity in empirical biological networks.

Conclusion: Our model allows for the systematic study of the presence of community structure and its impact on network function and dynamics. This process is a crucial step in unraveling the functional consequences of the structural properties of biological systems and uncovering the mechanisms that drive these systems.

PubMed Disclaimer

Figures

**Figure 1**
**Schematic representation of the steps of our algorithm.(a)** The algorithm assigns a within-degree and between-degree to each node, which are represented here as half-within-edges and half-between-edges respectively. **(b)** The half-between-edges are then connected using a modified version of the Havel-Hakimi algorithm, and to remove degree correlations, the between-edges are randomized. **(c)** Finally, the half-within-edges are connected using the standard Havel-Hakimi algorithm for each module and **(d)** the within-edges are randomized to remove degree correlations.

**Figure 2**
**Modular random graphs with** n **=150,** m **=375,** K **=3,** P ( s **=50)=1 and**p_k is power law with modularity values of: a) Q **=0.1; b)** Q **= 0.3; and c)** Q **= 0.6.** As the modularity increases, the ratio of the total number of edges within modules to the number of edges in the network increases (i.e. $\bar{d_{w}}$ increases), while the remaining parameter values (degree distribution, network mean degree, number of modules) are held constant.

**Figure 3**
**Values of (a) Assortativity,** r **, (b) clustering coefficient,** C **, and (c) path length,** L **in modular random graphs do not vary significantly with increasing modularity (** Q ). Each graph has n=2000 nodes, a mean degree $\bar{d} = 10$ and K=10 modules with P(s=200)=1. The data points represent the average value of 50 random graphs. Standard deviations are plotted as error bars.

**Figure 4**
**Performance of the Louvain method [**[46]**], fast modularity method [**[47]**], the spin-glass based method [**[48]**], the infoMAP method [**[49]**], label propagation [**[50]**] and the random-walk based method [**[51]**] in networks with mean degree 10.** Fill circles, open circles and triangles represent networks with Poisson, geometric and power-law degree distributions, respectively. Each data point represents the average over ten modular random graphs. Error bars represent standard deviations. The solid line is the reference line where estimated modularity is equal to the input modularity.

**Figure 5**
**Visualization of empirical and random graphs of social interaction of dolphins and food-web trophic interactions at the Little Rock Lake in Wisconsin.** Figure **(a)** is the empirical network of Dolphin social network, **(b)** its modular random graph, and **(c)** its random graph counterpart with matched degree distribution (Q=0). Figure **(d)** is the empirical network for the food-web trophic interaction at Little Rock Lake in Wisconsin, **(e)** is its modular random graph and **(f)** its random graph counterpart with matched degree distribution. Modular random graphs have generated to match the overall degree distribution, network mean degree, the level of modularity and the number of modules of the empirical graphs. Random graphs with matched degree distribution are based on the configuration model.

**Figure 6**
**Comparisons of empirical networks, modular random graphs and random graphs with matched degree distribution (based on the configuration model).** The figure summarizes network statistics of the empirical network as well as the ensemble mean of two types of random graphs in terms of **(a)** Modularity, Q ; **(b)** Assortativity, r; **(c)** Path length, L and **(d)** Clustering coefficient, C. The path length value for the empirical Yeast-Protein interaction network is missing as the network contains disconnected components. Error bars denote standard deviation from the ensemble mean of the generated random graphs. Errors bars for modular random graphs in Figure 6(a) have been omitted as the value of modularity (Q) match the empirical networks perfectly. FW = Little Rock food web, YP = Yeast protein interaction network, CM =*C.elegans* metabolic network and DS = Dolphin social network.

See this image and copyright information in PMC

References

1. Proulx SR, Promislow DEL, Phillips PC. Network thinking in ecology and evolution. Trends Ecol Evol. 2005;20(6):345–53. doi: 10.1016/j.tree.2005.04.004. - DOI - PubMed
1. Bansal S, Khandelwal S, Meyers LA. Exploring biological network structure with clustered random networks. BMC Bioinformatics. 2009;10:405. doi: 10.1186/1471-2105-10-405. - DOI - PMC - PubMed
1. Girvan M, Newman MEJ. Community structure in social and biological networks. Proc Nat Acad Sci USA. 2002;99(12):7821–7826. doi: 10.1073/pnas.122653799. - DOI - PMC - PubMed
1. Newman M. Mixing patterns in networks. Phys Rev E. 2003;67(2):026126. - PubMed
1. Ravasz E, Somera AL, Oltvai ZN, Barabási AL. Mongru Da. Hierarchical organization of modularity in metabolic networks. Science (New York, NY) 2002;297(5586):1551–1555. doi: 10.1126/science.1073374. [ http://www.ncbi.nlm.nih.gov/pubmed/12202830] - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Exploring community structure in biological networks with random graphs

Affiliation

Exploring community structure in biological networks with random graphs

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources