Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;3(12):e3911.
doi: 10.1371/journal.pone.0003911. Epub 2008 Dec 15.

Human gene coexpression landscape: confident network derived from tissue transcriptomic profiles

Affiliations

Human gene coexpression landscape: confident network derived from tissue transcriptomic profiles

Carlos Prieto et al. PLoS One. 2008.

Abstract

Background: Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global "omic" scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided.

Methodology/principal findings: Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families.

Conclusions/significance: The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations. The data are available free online at http://bioinfow.dep.usal.es/coexpression/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Clustering of human tissue expression profiles.
Heatmaps and clustering of the 48 human genome-wide expression microarray samples from 24 different tissues and organs analyzed by two different methods: (A) MAS5-Spearman: MAS5 for signal calculation and Spearman for distance calculation based on the sample expression profiles; and (B) RMA-Pearson: RMA for signal calculation and Pearson for distance calculation based on the sample expression profiles. A color bar with scales for each heatmap is included, indicating that dark-red corresponds to minimum distance and dark-blue to maximum distance. The color distributions observed in the heatmaps are also included inside the bars.
Figure 2
Figure 2. Plot of r and N coefficients calculated for each gene coexpression pair.
rN-plots that represent the correlation coefficient (from 0 to 1) versus the cross-validation coefficient (from 0 to 1000) of each gene pair by two different methods: (A) MAS5-Spearman and (B) RMA-Pearson. The cross-validation is considered positive for a given gene pair when it gives r>|0.7| in each sampling. As indicated in Methods 1000 samplings are run for each gene-probeset pair. The gene probeset pairs that correspond to the same gene are drawn as red circles. The probeset pairs of Affymetrix controls are drawn as blue circles. A random selection of 10,000 coexpressed gene probeset pairs are drawn as black circles. Two dotted lines are drawn to indicate an approximate threshold that can be considered the border of noisy data. These lines are drawn just to show the minimal r and N values bellow which the coexpressed gene pairs are mainly noise; therefore the coexpression signal appears mostly at r>0.65 and N>220.
Figure 3
Figure 3. Accuracy and coverage of the coexpression data.
Accuracy measured as Positive Predictive Value PPV (for all genes in blue and filtered genes in red) and coverage as True Positive Rate TPR (in black) computed for each coexpression dataset obtained at a given correlation coefficient r (top figures) or at a given number of cross-validations N (bottom figures) for both methods: (A) MAS5-Spearman and (B) RMA-Pearson. The accuracy and coverage (in y axis) correspond to accumulated values for each r≥x or for each N≥x.
Figure 4
Figure 4. Coexpression networks obtained at different levels of accuracy.
Color plots (A and B) that represent the Positive Predictive Value (PPV) calculated for each set of gene coexpression data for different values of correlation coefficient (r) and cross-validation coefficient (N). The PPV corresponds to accumulated values for N≥x and r≥y. Calculations are done for data derived from two methods: (A) MAS5-Spearman without gene filtering (all gn) and (B) RMA-Pearson with gene filtering (filtered gn). Table (C) shows the specific values of correlation and cross-validation for three coexpression datasets derived from each method at 3 specific PPVs: ≥0.60, ≥0.70 and ≥0.80. This table also shows the number of nodes and links included in each coexpression dataset. Table (D) shows the number of gene-nodes and interaction-links that are included in the combined coexpression networks at 3 specific PPVs.
Figure 5
Figure 5. Coexpression of house-keeping and tissue-specific genes.
Top panels A and B: Density distributions of coexpression data for N>220 corresponding to all gene pairs (in black), to Eisenberg's house-keeping gene pairs (in green) or to Hsiao's house-keeping gene pairs (in red). Bottom panels A and B: rN-plots with all data points of coexpression pairs with N>220 and r>0.65 for either all gene pairs (in black) or only Hsiao's house-keeping gene pairs (in red). In these panels (A) correspond to data from MAS5-Spearman method and (B) from RMA-Pearson method. Panels (C) 6 rN-plots that present the coexpression data obtained with the RMA-Pearson method corresponding to the human genes included in 6 different pathways: (1) ribosome (KEGG ID = hsa03010), (2) oxidative phosphorylation (hsa00190), (3) proteasome (hsa03050), (4) cytokine-cytokine receptor interaction (hsa04060), (5) neuroactive ligand-receptor interaction (hsa04080), and (6) complement and coagulation cascades (hsa04610).
Figure 6
Figure 6. Human Gene Coexpression Network.
Graphical view of the human gene coexpression network where the nodes correspond to genes and the edges to coexpression links. The network was produced as the intersection of two datasets (MAS5-Spearman and RMA-Pearson datasets with PPV≥0.60) to provide a confident coexpression network that includes 615 genes and 2190 pairwise coexpression interactions. The network includes only groups of coexpressing genes with at least three nodes. The most significant regions have been marked with background colors and labels describe main functions assigned. For each node the color (from red to grey) and shape (circles or diamonds) were obtained with MCODE algorithm. The circular nodes are the ones found with high cluster coefficient and the diamond nodes are the ones with lower cluster coefficient. The intensity of the red color in the nodes also indicates the degree of clustering, changing till pale grey for the most peripheral nodes that only have one link.
Figure 7
Figure 7. Coexpressed gene modules regulated by specific transcription factors.
(A) Graphical enlarged view of three coexpressing modules selected from the network presented in Figure 6, indicating the name of each gene corresponding to each node and the functional labels: (Module 1) metal ion homeostasis; (Module 2) response to biotic stimulus; (Module 3) extracellular matrix and adhesion. (B) Table showing the results of the search for common transcription factors (TFs) most significantly associated to the genes included in each of the three modules described above. The search was done using the bioinformatic tools PAP and FactorY.

Similar articles

Cited by

References

    1. van Noort V, Snel B, Huynen MA. The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep. 2004;5:280–284. - PMC - PubMed
    1. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 2004;14:1085–1094. - PMC - PubMed
    1. Tirosh I, Weinberger A, Carmi M, Barkai N. A genetic signature of interspecies variations in gene expression. Nat Genet. 2006;38:830–834. - PubMed
    1. Magwene PM, Kim J. Estimating genomic coexpression networks using first-order conditional independence. Genome Biol. 2004;5:R100. - PMC - PubMed
    1. Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003;302:249–255. - PubMed

Publication types

Substances