Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;8(2):e1002391.
doi: 10.1371/journal.pcbi.1002391. Epub 2012 Feb 23.

Robust detection of hierarchical communities from Escherichia coli gene expression data

Affiliations

Robust detection of hierarchical communities from Escherichia coli gene expression data

Santiago Treviño 3rd et al. PLoS Comput Biol. 2012.

Abstract

Determining the functional structure of biological networks is a central goal of systems biology. One approach is to analyze gene expression data to infer a network of gene interactions on the basis of their correlated responses to environmental and genetic perturbations. The inferred network can then be analyzed to identify functional communities. However, commonly used algorithms can yield unreliable results due to experimental noise, algorithmic stochasticity, and the influence of arbitrarily chosen parameter values. Furthermore, the results obtained typically provide only a simplistic view of the network partitioned into disjoint communities and provide no information of the relationship between communities. Here, we present methods to robustly detect co-regulated and functionally enriched gene communities and demonstrate their application and validity for Escherichia coli gene expression data. Applying a recently developed community detection algorithm to the network of interactions identified with the context likelihood of relatedness (CLR) method, we show that a hierarchy of network communities can be identified. These communities significantly enrich for gene ontology (GO) terms, consistent with them representing biologically meaningful groups. Further, analysis of the most significantly enriched communities identified several candidate new regulatory interactions. The robustness of our methods is demonstrated by showing that a core set of functional communities is reliably found when artificial noise, modeling experimental noise, is added to the data. We find that noise mainly acts conservatively, increasing the relatedness required for a network link to be reliably assigned and decreasing the size of the core communities, rather than causing association of genes into new communities.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distribution of gene relatedness and network size in the E. coli CLR network.
(A) Probability distribution of relatedness values, formula image, between pairs of genes in E. coli calculated using the CLR algorithm and the full formula image dataset. (B) Size of the largest connected component for relatedness value, formula image. At small values of formula image the network is fully connected but begins to break up into multiple disconnected components at a critical value of approximately formula image.
Figure 2
Figure 2. Correlation matrix showing community structure found in the E. coli network with relatedness threshold values .
Genes are ordered in the same sequence along the x and y axes beginning in the upper left corner, and this ordering is the same for all three relatedness values (gene order is given in SI). The matrix element in the position formula image is colored blue, red, or green if genes formula image and formula image are in the same community at threshold values 2, 4 or 6, respectively. The density of the color indicates the strength of the correlation in the partitionings of the pair of genes. For example, considering the correlation between a pair of genes in the 10 replicate partitionings performed on the formula image network, dark and light red indicates that the pair of genes are always and rarely found to be in the same community, respectively. The red, green and blue colors corresponding to formula image thresholds, respectively, are combined to indicate the correlations of each pair of genes at all three threshold values. Thus, the color of the matrix element in the position formula image is white if genes formula image and formula image are in the same community at all three threshold values. It is purple (yellow) if the two genes are in the same community at thresholds 2 and 4 (4 and 6), but not at threshold 6 (2) and it is black if the two genes are not in the same community at any of the three threshold values. A list of the order of genes is given in Dataset S2. A full size version with each pixel representing a distinct pair of genes is given in Figure S1.
Figure 3
Figure 3. Change in core community structure as noise is increased from to .
The grey scale value of each element indicates the fraction of times the two genes occurred in the same community over replicate community partitionings. If the element is white (black) the two genes were always (never) found in the same community. At each noise value there are clearly white diagonal blocks indicating sets of genes that are always found in the same community, which we refer to as core communities. Note that, the five core communities at formula image (Figure 3A) are in the same order in Figure 3:B, C, D, and E. Within each of the five core communities of Figure 3A, the node order is allowed to change in Figure 3:B, C, D, and E in order to display the largest subcommunity first. For each panel, he list of of the order of genes and the core community they belong to is given in Dataset S5 and Dataset S6, respectively. A full size version with each pixel representing a distinct pair of genes is included in Figure S3.
Figure 4
Figure 4. The effect of noise on core community structure and GO term enrichment.
(A) Proportion of formula image core community nodes that remain in a core community. (B) The number of significant GO term enrichments as a function of noise level formula image for networks constructed with formula image. If a GO term is enriched by more than one community, each enrichment is counted separately.
Figure 5
Figure 5. Links connecting operons in the community that enriches for genes involved in ribosome structure.
CLR links are in light blue, RegulonDB links are in black. Small symbols are genes that are not in the community, but are regulators of genes that are in the community and are therefore candidates for mediating indirect interactions between community genes. Symbol shape and color indicate attributes as follows: red, transcription factors; dark blue, ppGpp regulated promoter by direct assay ; light blue, ppGpp regulated translation related promoter by microarray ; pink, other; hexagon, formula image promoter; diamond, formula image promoter; square, formula image promoter; circle, unknown sigma factor. Note that very few interactions observed in the CLR network can be explained by the direct interactions annotated in RegulonDB. The high proportion of ppGpp sensitive promoters among operons contained in the community suggests this molecule as a good candidate for regulating the remaining interactions. The network layout was determined by the circular layout option in Cytoscape 2.8.1, no particular significance should be attached to operons being outside the main circle.

Similar articles

Cited by

References

    1. Balázsi G, Heath AP, Shi L, Gennaro ML. The temporal response of the Mycobacterium tuberculosis gene regulatory network during growth arrest. Mol Sys Biol. 2008;4:225. - PMC - PubMed
    1. Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, et al. A genomic regulatory network for development. Science. 2002;295:1669–1678. - PubMed
    1. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5:e8. - PMC - PubMed
    1. Ma HW, Buer J, Zeng AP. Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach. BMC Bioinformatics. 2004;5:199. - PMC - PubMed
    1. Cooper TF, Remold SK, Lenski RE, Schneider D. Expression profiles reveal parallel evolution of epistatic interactions involving the CRP regulon in Escherichia coli. PLoS Genetics. 2008;4:e35. - PMC - PubMed

Publication types

Substances