Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan 20;7(1):e1001057.
doi: 10.1371/journal.pcbi.1001057.

Is my network module preserved and reproducible?

Affiliations

Is my network module preserved and reproducible?

Peter Langfelder et al. PLoS Comput Biol. .

Abstract

In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Network plot of the module of cholesterol biosynthesis genes in different mouse tissues.
The module is defined as a signed weighted correlation network among genes from the GO category Cholesterol Biosynthetic Process. Module preservation statistics allow one to quantify similarities between the depicted networks. The figure depicts the connectivity patterns (correlation network adjacencies) between cholesterol biosynthesis genes in 4 different mouse tissues from male and female mice of an F2 mouse cross. The thickness of the line reflects the absolute correlation. The line is colored in red if the correlation is positive and green if it is negative. The size of each black circle indicates the connectivity of the corresponding gene; hubs (i.e., highly connected) genes are represented by larger circles. Visual inspection suggests that the male and female liver networks are rather similar and show some resemblance to those of the adipose tissue. Module preservation statistics can be used to measure the similarity of connectivity patterns between pairs of networks.
Figure 2
Figure 2. Preservation of GO term cholesterol biosynthetic process across mouse tissues.
Quantitative evaluation of the similarities among the networks depicted in Figure 1. As reference module, we define a correlation network among the genes of the GO term “Cholesterol biosynthetic process” (CBP) in the female mouse liver network. Panels A–C show summary preservation statistics in other tissue and sex combinations. Panel A shows the composite preservation statistic formula image. The CBP module in the female liver network is highly preserved in the male liver network (formula image) and moderately preserved in adipose networks. There is no evidence of preservation in brain or muscle tissue networks. Panels B and C show the density and connectivity statistics, respectively. Panel D shows the results of the in group proportion analysis . According to the IGP analysis, the CBP module is equally preserved in all networks. E–K show the scatter plots of formula image in one test data set (indicated in the title) vs. the liver female reference set. Each point corresponds to a gene; Pearson correlations and the corresponding p-values are displayed in the title of each scatter plot. The eigengene-based connectivity formula image is strongly preserved between adipose and liver tissues; it is not preserved between female liver and the muscle and brain tissues.
Figure 3
Figure 3. Cross-tabulation based comparison of modules (defined as clusters) in human and chimpanzee brain networks.
A. Hierarchical clustering tree (dendrogram) of genes based on human brain co-expression network. Each “leaf” (short vertical line) corresponds to one gene. The color rows below the dendrogram indicate module membership in the human modules (defined by cutting branches of this dendrogram at the red line) and in the chimpanzee network (defined by branch cutting the dendrogram in panel B.) The color rows show that most human and chimpanzee modules overlap (for example, the turquoise module). B. Hierarchical clustering tree of genes based on the chimpanzee co-expression network. The color rows below the dendrogram indicate module membership in the human modules (defined by cutting branches of dendrogram in panel A.) and in the chimpanzee network (defined by branch cutting the dendrogram in this panel.) C. Cross-tabulation of human modules (rows) and chimpanzee modules (columns). Each row and column is labeled by the corresponding module color and the total number of genes in the module. In the table, numbers give counts of genes in the intersection of the corresponding row and column module. The table is color-coded by formula image, the Fisher exact test p value, according to the color legend on the right. Note that the human yellow network is highly preserved while the human blue network is only weakly preserved in the chimpanzee network.
Figure 4
Figure 4. Composite preservation statistics of human modules in chimpanzee samples.
A. The summary statistic formula image (formula image-axis), Equation 1, as a function of the module size. Each point represents a module, labeled by color and a secondary numeric label (1 = turquoise, 2 = blue, 3 = brown, 4 = yellow, 5 = green, 6 = red, 7 = black). The dashed blue and green lines indicate the thresholds formula image and formula image, respectively. B. The composite statistic formula image (y-axis), Equation 34, as a function of the module size. Each point represents a module, labeled by color and a secondary numeric label as in panel A. Low numbers on the formula image axis indicate a high preservation. C. Observed IGP statistic (Kapp and Tibshirani, 2007) versus module size. D. P-value of the IGP statistic versus module size. E. and F. show scatter plots between the observed IGP statistic and formula image and formula image, respectively. In this example, where modules are defined as clusters, the IGP statistic has a high positive correlation (formula image) with formula image and a moderately large negative correlation (formula image) with formula image. The negative correlation is expected since low median ranks indicate high preservation.
Figure 5
Figure 5. Connectivity-based statistics for evaluating the preservation of the human yellow and blue modules in the chimpanzee network.
A. Heatmaps and eigengene plots for visualizing the gene expression profiles of the yellow module genes (rows) across human brain microarray samples (columns). In the heat map, green indicates under-expression, red over-expression, and white mean expression. The module eigengene expression depicted underneath the heat map shows how the eigengene expression (y-axis) changes across the samples (x-axis) which correspond to the columns of the heat map. The eigengene can be interpreted as a weighted average gene expression profile. The color bar below the eigengene indicates the region from which the sample was taken: light blue color indicates cortical samples, magenta indicates cerebellum samples, and orange indicates caudate nucleus samples. Scatter plots B.–D. show that the connectivity patterns of the yellow module genes tends to be highly preserved between the two species. B. Scatter plot of gene-gene correlations in chimpanzee samples (formula image-axis) vs. human samples (formula image-axis) within the human yellow module. Each point corresponds to a gene-gene pair. The scatter plot exhibits a significant correlation (cor.cor and p-value displayed in the title), indicating that the correlation pattern among the genes is preserved between the human and chimpanzee data. C. Scatter plot of intramodular connectivities, Equation 7, of genes in the human yellow module in chimpanzee samples (formula image-axis) vs. human samples (formula image-axis). Each point corresponds to one gene. The scatter plot exhibits a significant correlation (cor.kIM and p-value displayed in the title), indicating that the hub gene status in the human yellow module is preserved in the chimpanzee samples. D. Scatter plot of eigengene-based connectivities, Equation 17, of genes in chimpanzee samples (formula image-axis) vs. human samples (formula image-axis). Each point corresponds to one gene. The scatter plot exhibits a significant correlation (cor.kME and p-value displayed in the title), indicating that fuzzy module membership in the human yellow module is preserved in the chimpanzee samples. Scatter plots E.–H. show that the human blue module is less preserved in the chimpanzee network. Note that the correlations in scatter plots F.–H. are lower than the corresponding correlations in the yellow module plots B.–D., indicating weaker preservation of the human blue module in the chimpanzee samples. Overall, these results agree with those from the cross-tabulation based analysis reported in Figure 3.
Figure 6
Figure 6. Composite preservation statistics for KEGG pathways between human and chimp brain networks.
Here we present the composite statistics formula image (panel A) and formula image (panel B), and the IGP statistic (panels C and D). Panels E. and F. show scatter plots between the observed IGP statistic and formula image and formula image, respectively. Here we find no significant relationship between the IGP statistic and the composite module preservation statistic. Since KEGG modules do not correspond to clusters, it is not clear whether cluster preservation statistics are useful in this example.
Figure 7
Figure 7. Detailed preservation analysis of KEGG pathways between human and chimp brain networks.
The first column presents summary preservation formula image statistics (y-axis) for selected KEGG pathways (interpreted as modules) versus the number of genes in the pathway (x-axis). Panel A shows formula image (Equation 1), panel B shows the density summary statistic formula image (Equation 30), and panel C shows the connectivity summary statistic formula image (Equation 31). Pathway names are shortened for readability. Panel A shows that MAPK, Calcium, Endocytosis, Wnt, and Phosphatidylinositol show strong evidence of preservation (formula image) while the apoptosis module is not preserved. Panel C shows that this preservation signal mainly reflects connectivity preservation formula image (Equation 31) while panel B reveals that most modules have weak to moderate density preservation (formula image) (Equation 30). Note that the apoptosis pathway shows no evidence of preservations. Panels D–H display scatter plots of eigengene-based connectivities in the chimpanzee data (formula image-axis) vs. in the human data (formula image-axis). Each point represents a gene in the pathway. Higher correlation means that the internal co-expression structure of the pathway is more strongly preserved. The apoptosis pathway has the lowest formula image statistic, while the Phosphatidylinositol pathway has the highest. The circle plots in panels L and M show connection strengths among apoptosis genes in humans and chimpanzees, respectively.
Figure 8
Figure 8. Relationships between module preservation statistics based on applications.
The (average linkage) hierarchical cluster trees visualize the correlations between the preservation statistics. The preservation statistics are colored according to their type: density statistics are colored in red, connectivity preservation statistics are colored in blue, separability is colored in green, and cross-tabulation statistics are colored in black. Note that statistics of the same type tend to cluster together. A derivation of some of these relationships is presented in Supplementary Text S1.
Figure 9
Figure 9. Design and main results of simulation studies of module preservation.
The first column outlines 6 (out of 7) simulation scenarios. Results for the seventh simulation scenario can be found in Supplementary Text S6. Preserved and non-preserved modules are marked in red and black, respectively. The grey module (labeled 0) represents genes whose profiles are simulated to be independent (that is, without any correlation structure). The second and third columns report values of composite statistics formula image and formula image, respectively, as a function of module size. The blue and green horizontal lines show the thresholds of formula image and formula image, respectively. Each figure title reports the Kruskal-Wallis test p-value for testing whether the preservation statistics differ between preserved and non-preserved modules. Note that the proposed thresholds (formula image for preserved and formula image for non-preserved modules) work quite well. The fourth column shows the permutation p-values of IGP obtained by the R package clusterRepro. The blue and brown lines show p-value thresholds of 0.05 and its Bonferroni correction, respectively. The IGP permutation p-value is less successful than formula image at distinguishing preserved from non-preserved modules. The fifth and last column shows scatter plots of observed IGP vs. formula image. We observe that IGP and formula image tend to be highly correlated when modules correspond to clusters with varying extents of preservation.

References

    1. Almaas E. Biological impacts and context of network theory. J Exp Biol. 2007;210:1548–1558. - PubMed
    1. Hudson NJ, Reverter A, Dalrymple BP. A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS Comput Biol. 2009;5:e1000382. - PMC - PubMed
    1. Zhou X, Kao M, Wong W. Transitive Functional Annotation By Shortest Path Analysis of Gene Expression Data. Proc Natl Acad Sci U S A. 2002;99:12783–88. - PMC - PubMed
    1. Stuart JM, Segal E, Koller D, Kim SK. A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules. Science. 2003;302:249–255. - PubMed
    1. Zhang B, Horvath S. General framework for weighted gene coexpression analysis. Stat Appl Genet Mol Biol. 2005;4:17. - PubMed

Publication types