Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar;40(6):2377-98.
doi: 10.1093/nar/gkr902. Epub 2011 Nov 24.

Gene network inference and visualization tools for biologists: application to new human transcriptome datasets

Affiliations

Gene network inference and visualization tools for biologists: application to new human transcriptome datasets

Daniel Hurley et al. Nucleic Acids Res. 2012 Mar.

Abstract

Gene regulatory networks inferred from RNA abundance data have generated significant interest, but despite this, gene network approaches are used infrequently and often require input from bioinformaticians. We have assembled a suite of tools for analysing regulatory networks, and we illustrate their use with microarray datasets generated in human endothelial cells. We infer a range of regulatory networks, and based on this analysis discuss the strengths and limitations of network inference from RNA abundance data. We welcome contact from researchers interested in using our inference and visualization tools to answer biological questions.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic of the network inference framework. From left to right, transcriptome data are passed through pre-processing and normalization functions, then used as input for the range of network inference algorithms described in Table 1. Networks inferred by each of the methods are output in a standard format, in which they can be compared against each other and against literature relationships using methods described in Table 2. Finally, conclusions from the network comparison and analyses are used to inform experimental decisions.
Figure 2.
Figure 2.
Results from siRNA-mediated perturbation of human endothelial cells. (A) Histogram of knockdown effectiveness for 400 siRNA-mediated perturbations in endothelial cells as reported by microarray (a value of 1 on the x-axis corresponds to no change to the array signal for the target RNA after siRNA knockdown, while a value of 0.1 corresponds to a 90% reduction in the array signal for the target RNA, so that only 10% of the median signal remains). (B) Distribution of Spearman correlation (ρ) and MI for all possible transcript-pairs (∼140 000) between the 379 Rel/NFκB-related transcripts in the siRNA-mediated knockdown dataset. (C) Correlation and MI for all possible pairings from (A) are shown in grey, overlaid by the pairings found in the Rel/NFκB-related reference network in red. (D) Distribution of correlation for all possible pairs between the 379 Rel/NFκB-related transcripts in the siRNA-mediated knockdown dataset (black line) compared to correlation for pairs in the Rel/NFκB-related reference network (red line). (E) Distribution of mutual information for all possible pairs between the 379 Rel/NFκB-related transcripts in the siRNA-mediated knockdown dataset (black line) compared to mutual information for pairs in the Rel/NFκB-related reference network (red line). (F) Distribution of Pearson correlation and mutual information for all possible pairs (∼67 000) between the 260 TNF-related transcripts in the TNF timecourse dataset. (G) Distribution of Pearson correlation and mutual information for all possible pairs from panel F shown in grey, overlaid by the pairs found in the TNF-related reference network (in red).
Figure 3.
Figure 3.
Number of edges (x-axis) in the Rel/NFκB reference network present in each inferred gene network generated from the siRNA dataset using five methods. The grey histogram represents the distribution of shared edges in the randomly relabelled network and the reference network. The red line indicates on each x-axis the number of reference network edges present in each inferred network, while the blue lines indicate the 95% confidence interval for the distribution of reference network edges present in the 1000 randomly relabelled networks.
Figure 4.
Figure 4.
Number of edges (x-axis) in the TNF-based reference network present in each inferred gene network generated from the TNF timecourse dataset using five methods. The grey histogram represents the distribution of shared edges in the randomly relabelled network and the reference network. The red line indicates on each x-axis the number of reference network edges present in each inferred network, while the blue lines indicate the 95% confidence interval for the distribution of reference network edges present in the 1000 randomly relabelled networks.
Figure 5.
Figure 5.
Comparison of the types of reference network edges recovered by different inferred networks. (A) MI versus correlation graphs for the edges recovered by each of the inferred networks. Red circles indicate the MI and correlation for the reference network edges, while grey dots indicate the distribution of all possible edges in the 379 NFκB-associated mRNA/400 siRNA microarray data set. (B) Heat-map of NFκB reference network edges sorted into bins in descending order of absolute value of Pearson's correlation. Only the 5000 most correlated edges are included, so the y-axis bins include only reference network edges with absolute value of Pearson's correlation coefficient between 0.53 and 0.93. Band colour represents the fraction of edges in each bin that were identified by each inferred network, as defined by the key at the right of the heat map. (C) Venn diagram showing the reference network edges present in four of the inferred networks.
Figure 6.
Figure 6.
Coregulatory relationships in the Rel/NFκB-based reference network present in networks inferred using five different methods, compared to coregulation network relationships recovered at random. The red line indicates the number of coregulatory reference network relationships present in each inferred network, while the blue lines indicate the 95% confidence interval for the distribution of edges recovered from 100 randomly relabelled inferred networks.
Figure 7.
Figure 7.
Coregulatory reference network relationships present in the TNF-based reference network inferred using three different methods, compared to coregulatory reference network relationships recovered at random. The red line indicates the number of relationships recovered by each inferred network, while the blue lines indicate the 95% confidence interval for the distribution of edges recovered from 100 randomly relabelled networks.
Figure 8.
Figure 8.
Assessment of higher structures in networks. (A) Comparison of connectivity in the ARACNE (MI) network and the SiGN-BN (Bayesian) network. (B) Comparison of connectivity in the ARACNE (MI) network and the MIKANA (DS) network. Each point represents a single node. Points are coloured according to the fraction of relationships each node has in common between the two networks, from blue (few common relationships) to red (many common relationships). The histogram on the right of the plots indicates the relative fraction of nodes in each degree of commonality, from blue to red.
Figure 9.
Figure 9.
Schematic of key relationships surrounding the Rel/NFkB family. Red upstream regulators are primarily associated with localization, and green upstream regulators are primarily associated with phosphorylation.
Figure 10.
Figure 10.
Comparison of correlation and MI between NFKB1 and its targets and also between regulators of NFKB1 and its targets. Relationships in the whole dataset (black line) with relationships between transcripts identified as being experimentally related in the reference network (red line). (A) Comparison of correlation and (B) MI: (i) between all possible relationships between upstream regulators of NFΚB1 and NFΚB1 targets (black line) and NFΚB1 and its targets (red line). (C) Schematic showing examples of Spearman's correlation coefficients between reference network NFΚB1 regulators, NFΚB1 targets and NFΚB1 itself, taken from the key regulators shown in Figure 9. As in Figure 9, red upstream regulators are primarily associated with localization, and green upstream regulators are primarily associated with phosphorylation.
Figure 11.
Figure 11.
Assessment of pathways in selected networks. (A) Basic principle of comparing the number of ‘hops’ between transcripts. In the reference network, transcripts A and C are connected by a single interaction. In the inferred network, A is indirectly connected to C via B. (B) Pathway comparison of a MIKANA network against each edge in the SiGN-BN (Bayesian) network. The red line indicates the number of transcripts connected by a particular number of hops in the MIKANA network, required to trace every directly connected pair of nodes in the SiGN-BN (Bayesian) network. The grey lines indicate the number of transcripts connected by a particular number of hops for the randomly relabelled networks, and the dotted blue lines indicate the 95% confidence interval for the distribution of randomly relabelled networks. (C) Pathway comparison showing the number of hops required to trace NFKB1 regulator to NFKB1 target inferred network edges in the NFKB1-associated reference network. X = N represents the situation where no path exists.
Figure 12.
Figure 12.
Assessing directionality of relationships in the perturbation and timecourse datasets. (A) Relationships in the Rel/NFκB-based reference network present in forward (left) and reversed (right) SiGN-BN Bayesian networks inferred from the siRNA dataset, compared to reference network relationships recovered at random. (B) Relationships in the TNF-based reference network present in forward (left) and reversed (right) SiGN-BN Bayesian networks inferred from the siRNA data set, compared to reference network relationships recovered at random. The red line indicates the number of reference network relationships present in the forward and revered inferred networks, while the blue lines indicate the 95% confidence interval for the distribution of edges recovered from 1000 randomly relabelled networks.

References

    1. Andreopoulos B, An AJ, Wang XG, Schroeder M. A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform. 2009;10:297–314. - PubMed
    1. Clarke R, Ressom HW, Wang AT, Xuan JH, Liu MC, Gehan EA, Wang Y. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat. Rev. Cancer. 2008;8:37–49. - PMC - PubMed
    1. Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 2004;3 - PubMed
    1. Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G. Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl Acad. Sci. USA. 2010;107:6286–6291. - PMC - PubMed
    1. Penfold CA, Wild DL. How to infer gene networks from expression profiles, revisited. Interface Focus. 2011;1:857–870. - PMC - PubMed

Publication types

MeSH terms